COMPAG. 


Compaq Confidential 


21264A Specifications 


Available Internally at: FTP:/;LOCOST.HLO.DEC.COM/21264A_EV67/DS-0014B-TE 


This document specifies the Alpha microprocessor that is known internally 
as EV67. 


Revision/Update Information: Revision 1.1, May 1999 


Compaq Computer Corporation 


Compaq Confidential 





May 1999 
The information in this publication is subject to change without notice. 


COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL 
ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAM- 
AGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. THIS 
INFORMATION IS PROVIDED “AS IS” AND COMPAQ COMPUTER CORPORATION DISCLAIMS ANY 
WARRANTIES, EXPRESS, IMPLIED OR STATUTORY AND EXPRESSLY DISCLAIMS THE IMPLIED WAR- 
RANTIES OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, GOOD TITLE AND AGAINST 
INFRINGEMENT. 


This publication contains information protected by copyright. No part of this publication may be photocopied or 
reproduced in any form without prior written consent from Compaq Computer Corporation. 


© 1999 Digital Equipment Corporation. 
All rights reserved. Printed in the U.S.A. 


COMPAQ. the Compag logo, the Digital logo, and VAX Registered in United States Patent and Trademark Office. 
Pentium is a registered trademark of Intel Corporation. 


Other product names mentioned herein may be trademarks and/or registered trademarks of their respective compa- 
nies. 


Compaq Confidential 
21264A Revision 1.1 — Subject to Change 





Table of Contents 


Preface 


1. Introduction 


1.1 “FRE ACHNGCUIIG: «07.053 wane uesededw See teats eee eee ee eee Oo eRe Pee GS 1-1 
1.1.1 AOGIeSSING.c chs ke ieee he Mw ee ete teehee eee haw eee Oh cer ek ene ee 1-2 
1.1.2 intedel Data TVpeS< 460k ste eens Does ea oes Wit ed Rete wow eb osede 1-2 
1.1.3 Floating-Point Data Types ........ 0.0... . ect eee eens _ 1-2 
1.2 21264A Microprocessor Features... 1... ce ete eens 1-3 


2 ~ Internal Architecture 


2.1 21 264A MICIGAIChItecluile::..2 4. 8ig ue ew Swal vek nk oY Awe a ahaa ee atta acl Bes 2-1 
2.1.1 Instruction Fetch, Issue, and Retire Unit ............0 0.0... eee ee eee 2-2 
24.431 Virtual Program Counter Logic... 0.2... 2.0... cee cee eee 2-2 
2.1.1.2 Branch: Predictor. <60.d54 asee5 er-44G ee 4 ae ks ikied WW at e-etenha coke Sates 2-3 
2.1.1.3 Instruction-Stream Translation Buffer... 0.0... 0.0 eee eee 2-5 
2.1.1.4 Instructon Fetch LOGIC... epacice od sab hag Pig Goss 8a ERO el SEE SESS 2-6 
2.1.1.5 Register Rename Mans « ..osces dak inthe ye Beware Meee Pee eee EE SN ees 2-6 
2.1.1.6 integer ISSue Queue <3 bere tid eee Ee e eee ee eee eee oe VeRO RES 2-6 
2.1.1.7 Floating-Point Isstie Queue: ae isnt Si Sen 4 oe eh AG ES FA RRS SE 2-7 
2.1.1.8 Exception and Interrupt Logic... 2.2.2... 0... cece te eee eens 2-8 
2.1.1.9 Relie LOGIC. <5 wits Ge tare s is ae eso aos a we ad SO WE ORS eae 2-8 
2.1.2 Integer Execution Unies. saxo et es Riek ete oa ie ol Wa ae se hen eae 2 eee 2-8 
2.1.3 Floating-Point Execution Unit. ... 2.6... 0... ce tne e teens 2-10 
2.1.4 External Cache and System Interface Unit ..... 2.0... 0. eee ee ee eee 2-11 
2.1.4.1 Victim Address File and Victim Data File ... 2.0.0.0... 0.0... ccc eee eee ee eee 2-11 
2.1.4.2 VO WING BUNGE 2.55 bccctees pacatnne il tem arene Go be eg Ree ak Uh ee ke ied Seana 2-11 
2.1.4.3 Probe QuUEUG.. «cu. cei t cog th ae Ase ohare WSS SS Se Naw we oe eee ea 2-11 
2.1.4.4 Duplicate Dcache Tag Array... 0... ce ete ee eee eee 2-11 
2.1.5 ONChID CaChes..¢ 6c cir eies bution o inp edie sa tteiey hates ew ten seinehied we ieR 2-11 
2.1.5.1 INStUCHON CAaChe: cece cer a hevaa-ed hse uae soe Swen claw a eae eae ead 2-11 
2.1.5.2 Dale: CaCnes esa. Si ew. Brinn aie RAIS eas Ra OUT e Rae aKa lon Saas 2-12 
2.1.6 Memory Relerence Uni cic dA de yet higwin ee Sh Oe te Oa OEE SLMS SSeS 2-12 
2.1.6.1 LOa0 QUGUC co nencn oa eu cess eb ede woe eS oes wae eG kee e 2-13 
2.1.6.2 SlOlG QUCUG 6.02 e beatbes os ahaa s eee ea eae eRe eee ees ee as 2-13 
2.1.6.3 Miss AGdIeSS FIG c5 o\. 2e ca eae oP Gahan ae be wa he oes A ee ee 2-13 
2.1.6.4 Dstream Translation Buffer... 0.2... 0... ee eens 2-13 
2.1.7 SHOM Intenate n5ca6 2 eee oui weet newa WES otis Saas ee Rees ae ae 2-13 
2.2 Pipeline Organization: 2...2 awake tec ar ings aie eoie Fe pee hok tes ewe eles 2-13 
2.2.1 Pipeline ADONS i524 2c ered ved voce ay Caden aed & bptcd ba ae ab ete we ees 2-16 
2.3 Instruction: IsSuG RUGS 3.60 a. oros aes ees oie Se eee ae ia ade alias 2-16 


Compag Confidential 
21264A Revision 1.1 — Subject To Change iti 


2.3.1 Instruction: Group Definitions -«c.62 o65.c veo daar et cinte tees ea54 be ves oea bes 2-17 


2.3.2 Ebox SIQUING 3):4425 GA ooeeeerd eeSeeeeleet eis ete ee week cat cote se 2-18 
2.3.3 INStUCHION LAlenGlGsS esi cias Med Beweee iis Ad mee Oke bd ed th oh wd a eetek 2-20 
2.4 INSTFUCTION- FRGTIIC RUICS «260 6 25 oy a parece ph ane Oe aa ack eke See ona REED ORT 2-21 
2.4.1 Floating-Point Divide/Square Root Early Retire... 0.2.2... 2. ce ee ee 2-22 
2.5 Retire of Operate Instructions into R31/F31 2.0... ee eens 2-22 
2.6 Load Instructions:to RS1-and: F371 o3..5 seiet eves ened eek Oe Ve Oe ee ek wae S 2-23 
2.6.1 Normal Prefetch: LDBU, LDF, LDG, LDL, LDT, LDWU Instructions ............... 2-23 
2.6.2 Prefetch with Modify Intent: LDS Instruction .... 0.0.0.0... 0. cee ee eee 2-23 
2.6.3 Prefetch, Evict Next: LDQ Instruction... 0... 0.0.0... cee eee eens 2-23 
2.7 Special Cases of Alpha Instruction Execution. ... 0.0.0... 0.0 e ccc eee eee 2-23 
2.7.1 LGad Hit SHCCUIANON: res cua Ss Hci ehe vets. tou Phe eean arene wees 2-24 
2.7.2 Floating-Point Store Instructions .. 2.2... 6k eet ee ene 2-25 
2.7.3 CMOV IisitiCnonicn.238 xcs piietatione tenet eineds Atos ora esa tee eau’ 2-26 
2.8 Memory and I/O Address Space Instructions .... 0.0.0.0... ccc cee eee ees 2-26 
2.8.1 Memory Address Space Load Instructions ..... 2.0... 0.0... eee eee 2-27 
2.8.2 /O Address Space Load Instructions. ........ 00... 00. ee ee eee 2-27 
2.8.3 Memory Address Space Store Instructions .......... 0.0... 0. cee eee eee 2-28 
2.8.4 I/O Address Space Store Instructions ... 0.2... eee 2-28 
2.9 MAF Memory Address Space Merging Rules.......... 0.0.00... eee eee eee 2-29 
2.10 INSHUCTION OIGGlING sais is Sad pes eae Rare Shea ow EA Sa RES a Tae gare 2-30 
2.11 Replay Wirapsacinsa ead che outs Sh bet tee ad wae he Nal Meee ee tu nce sane 2-31 
2.11.1 MbOx Order TADS 2.22 9.405 553.27 oe rhea eod ae ee aoe xsel GS ceeatae ieee 2-31 
2.11.1.1 Poad- “oad Orgel TaD «0.5054 et oe ede sh Goel Bis Nera ie ean ay Ge aren 2-31 
2.11.1.2 Store-Load Order Trap........ Bay Ss Setasta 0d onan Ge eopa ds anne eee anand ep 2-31 
2.11.2 Other Mbox Replay Traps 0024 sce seen sede etek beep ew ewes pidea es 2-32 
2.12 1/O Write Buffer and the WMB Instruction. .........0. 0.0.0.0 ce eee eee eens 2-32 
2.12.1 Memory Barrier (MB/WMB/TB Fill Flow) .......... 0.000002 cee eee ee ee eee 2-32 
2.12.1.1 MB: Instruction PMcessing <2)%s4..45 daa oo eee deh na Beek nee peete eS 2-32 
2.12.1.2 WMB Instruction Processing. ........... 0... eee ee eee eee 2-33 
2.12.1.3 Ti FIG W at sete eles es ee ate eee ere tek etic cn ea oe toe act nated 2-34 
2.13 Performance Measurement Support—Performance Counters .................0-204- 2-35 
2.14 Floating-Point Control Register... 2.6 sn cucs settee dew hae e eee a wea TE AS Ewes 2-35 
2.15 AMASK and IMPLVER Instruction Values ........... 0.0.00. cece ee eee ees 2-37 
2.15.1 BIVASR GS te hte oT ae are ose wo AD RE a ee eR a Re i ge 2-38 
2.15.2 IMPLV ERs neice easiectead eh nele haw eet eee aks Senet aN eee 2-38 
2.16 Design. Examples’ 25. uls och use oes cae te ote to 54ce Meg e hed uel sees hanes 2-38 


Hardware Interface | 


3.1 21264A Microprocessor Logic Symbol .......... 0... cece eee eee Rotaract 3-1 
3.2 21264A Signal Names and Functions. ......... 2... cee eens 3-3 
3.3 PIN: ASSIONMNGINS «2th vu thew en tee Rea eae eee ae eae Aeneas 3-8 
3.4 Mechanical Specifications 4.4... a64b i cate Phe Sas we Oe Eee an kee eee Be 3-16 
3.5 2IZBSIA PACKAGING 3 .o:5es sweerenherns ee at Ysa aw eee ae lloes Sree oes Made eases 3-18 


Cache and External Interfaces 


4.1 Introduction to the External Interfaces. ..... 0... . 0... cee cee ee ee eens 4-1 
4.1.1 pystem Inlenace 4 s045 Acct coma eh age ey boomers eee sey tatata dss 4-3 
4.1.1.1 Commands and Addresses... .. 2.0... . ce ee eee tenes 4-4 
4.1.2 Second-Level Cache (Bcache) Interface ......... 0.0.0.2... eee eee eee 4—4 
4.2 Physical Address Considerations ... 0.0.0... cee ee eee teens 4-4 
4.3 BGSECHE SUUCIUIe: secu Gar VES ot el oat wie eee Tamed Oia alone Ce se wees 4-6 
4.3.1 Beache-Intertace Signals. asc.-<.4 44's ss ees WSs oe Mess Rea nos Hoke ROMS iw dat 4-7 
4.3.2 System: Duplicate Tag Stores. 4oi2 s< wack eed eek ewe Gea seedy ee eeens 4-7 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change 


4.4 WICH Lala BUNGE aii: fo acecuee dog bw ew GS ech ene aor SA aki Boe nk ae ak 


4.5 Cache Conerency seats). sites bia eueretey pu ames ooK Ty oehcine etic Sea eect aa asta 
4.5.1 Cache-Coherency Basics. i c..9 schoo wee abe BES be esa ees 
4.5.2 Cache: BlOCK States iso sc uu wdn es See Shabana Noa de Gerke Red Se Mb ae oe 
4.5.3 Cache Block State Transitions. 2.0... 6 2 ccc ccc ene ee nee 
4.5.4 Using SysDc ‘Commands « oii. c5 le cece ewe hacen ds Dewees ee waded eee eee eee 
4.5.5 Dcache States and Duplicate Tags... 0.0... . ec ee eee eens 
4.6 Lock MECHANISM 660 dates e rena hadi Abia sie Sone lawns Sheen easee pextas 
4.6.1 In-Order Processing of LDx_L/STx_C Instructions ........... 0.0.0... cece ee eee 
4.6.2 Internal Eviction of LDx_L Blocks. ........ 0... cece ccc ec ee cee ee eens a best 
4.6.3 kiveness and Palmness:, 65:25:05 nS 2h seek eeded, Saad cad ged awe ae ares ae 
4.6.4 Managing Speculative Store Issues with Multiprocessor Systems ...............- 
4.7 SYSIOM POM cele nar weke mew oe eer tye Sh One pane tUa sw hades Sass Baew 
4.7.1 SV SISIV POR UNS. pg tee nan gaa Se dene emg gt aha eae ote sda der tLe ate oh ws eh atin dane 
4.7.2 Programming the System Interface Clocks ... 1.2.0... 0. cece eee ee eee 
4.7.3 21264A-to-System Commands ........... 0... c cette eee eee 
4.7.3.1 Bank Interleave on Cache Biock Boundary Mode ..................000000s 
4.7.3.2 Page Mode Fit 5...3 250s stented elas Vere Dein es EER aS Pee EE LAD cee SF 
4.7.4 21264A-to-System Commands Descriptions ........... 0.2... 0c eee ee eee 
4.7.5 ProbeResponse Commands (Command{[4:0] = 00001)............... 2.020 eee 
4.7.6 SysAck and 21264A-to-System Commands Flow Control................0.0000- 
4.7.7 System-to-21264A Commands ........... 0c ce eee teen eens 
4.7.7.1 Probe Commands (Four Cycles) ........... 2.2.0 cee eee ee eee eee 
4.7.7.2 Data Transfer Commands (Two Cycles)... .... 0.0.0.0... cee cee eee eee 
4.7.8 Data Movement In and Out of the 21264A.. 1.0... Le eee 
4.7.8.1 CIZGER GlOCK BASICS) ics 235. b ne ba Gee eel Leb eGotud een eee sg 
4.7.8.2 Fast. Data Modes. cach tse ae chutes ig sho eee oes pee ean iG peewee Ss 
4.7.8.3 Fast Data Disable Mode ....4:2.20% gas s0e0eeos oe oranda eed bee asa i awe: 
4.7.8.4 SysDatainValid_L and SysDataOutValid_L ... 2... ee eee as 
4.7.8.5 SYSFUM ANG e owe te cual US ay ss ke meas eevee hb ie Sak a heh ace ween 
4.7.8.6 Date Wapping 225% 2.6 p raced sae ee gs eA ae ee whe ee Me aegis ous 
4.7.9 Nonexistent Memory Processing ............... cece ee ee eee eect e treet ee eee 
4.7.10 Ordering of System Port Transactions. ....... 0.0... c cece eee eee 
4.7.10.1 21264A Commands and System Probes .......... 0... 0.2. e ee cee ee eee 
4.7.10.2 System Probes and SysDc Commands ............. 0.00 e eee eee ee eee 
4.8 BACKS FOR 662s ottaland one ooh eee Poet eee led date Shea eae Men ot aad eae bees 
4.8.1 Beache Pom Pies) anda eles orci ta hee haere ewe eae ue ene bare Sne ake 
4.8.2 Beaches Clocking) paccc54 atin tisha naked twadawatiee Cite exe ta dae See ees 
4.8.2.1 Setting the Period of the Cache Clock .......... 0... 0c cece eee eens 
4.8.3 Beaches TranSActiOns @ 5.5.5. a-24-8-Se5 gree Lies She are Noe Vw esate hate ee Gee Ae ain sg ae eens 
4.8.3.1 Bcache Data Read and Tag Read Transactions .............. 00... ee eee 
4.8.3.2 Bcache Data Write Transactions .... 2... ..... 02. e eee ee eens 
4.8.3.3 Bubbles on the Bcache Data Bus......... 02... cece eee ee ee ene 
4.8.4 FIN DESChPIONS 22. eee eS ea hee ee eae eRe eee peas ye ees 
4.8.4.1 BCAGK 284) cir ek aoe r anid ee eee tee ema Sha ase ea 
4.8.4.2 Bcache:Control:Pins sece2 tose nena ed unio e einen Nelgredia Gag-S om 9 
4.8.4.3 BcDatalnCik_H and BeTagInClk_H ........... cee eee eee eee nee 
4.8.5 * Beache: Banking: sncvlucse a diana owe au oe cua es teat ede Heed Heese 2 
4.8.6 Disabling the Bcache for Debugging ... 6.2... . cee eee eee eens 
4.9 WNGHUDUS .cccav tee tae cre als oo Gate waste’ oo peteahed aitesee anew wie ad mera ted Wea 


5 Internal Processor Registers 


5.1 EDOX IPAS cc Seepeu ddd eee tte Sie, wlio ee eRe ne A hole we eed 
5.1.1 Cycle Counter Register — CC... 2... ec ce eee eee ee een neces 
5.1.2 Cycle Counter Control Register- CC_CTL.............. race: laced tases Sean hte, tha 
5.1.3 Virtual Address Register -—VA ... 0.22... ce cc eee eee ete teense 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change 


4-17 
4-18 
4-19 
4-19 
4-20 
4-23 
4-24 
4-25 
4-25 
4-27 
4-28 
4-29 
4-30 
4-32 
4-33 
4-34 
4-35 
4-37 
4-38 
4-38 
4—40 

4-41 


4-43 
4-44 
446 
4-47 
4—49 
4-50 
4-51 

4-52 


4-52 
4-52 


5-3 


5-3 


5.1.4 
5.1.5 
5.2 
5.2.1 
§.2.2 
§.2.3 
5.2.4 
5.2.5 
5.2.6 
5.2.7 
5.2.8 
5.2.9 
5.2.10 
5.2.11 
5.2.12 
5.2.13 
5.2.14 
§.2.15 
5.2.16 
5.2.17 
5.2.18 
5.2.19 
5.2.20 
5.2.21 
5.2.22 
5.3 
5.3.1 
5.3.2 
§.3.3 
5.3.4 
5.3.5 
5.3.6 
§.3.7 
5.3.8 
5.3.9 
5.3.10 
§.3.11 
5.4 
5.4.1 
5.4.2 
5.4.3 
5.4.4 
5.4.5 


vi 


Virtual Address Control Register-— VA_CTL ........ 0... eee ee eee 5-4 


Virtual Address Format Register -— VA_LFORM................. 0.22000. 5-5 
IBOX TR RSs eset ese a a Ro Bere ht Ste ees alate agate Gk eA MS Ua inl aly et te ae 5-6 
ITB Tag Array Write Register -ITB_TAG ...... 2.0... eee eee 5-6 
ITB PTE Array Write Register -!TB_PTE.... 2... eee eee 5-6 
ITB invalidate All Process (ASM=0) Register -—ITB_IAP.............. 0.0.00 000. 5-7 
ITB Invalidate All Register -ITB_IA... 02... cc eee ee eee 5-7 
ITB Invalidate Single Register-ITB_IS... 2.0... 0. ee ccc ns 5-7 
ProfileMe PC Register—-PMPC............. 2... cee ec eee nee eens 5-8 
Exception Address Register - EXC_ADDR..... 0.2... . 0. eee eee 5-8 
Instruction Virtual Address Format Register — IVA_FORM................2.005- 5-9 
Interrupt Enable and Current Processor Mode Register -IER_CM................ 5-9 
Software Interrupt Request Register—SIRR.... 2.0.0.6... ee eee 5-10 
Interrupt Summary Register -ISUM ....... 0.20.0... cee ee ee ees 5-11 
Hardware Interrupt Clear Register— HW_INT_CLR .............. 00... 0.20000 5-12 
Exception Summary Register - EXC_SUM... 2.0.2.2... 00... cee eee ene 5-13 
PAL Base Register— PAL BASE. 2. cs s00c sve disewrgeee te dad ewes eewgauee os 5-15 
Ibox Control Register —ILCTE 4.a/4.0. soci ence tie es oes Mate Wee Knee bebo 5-15 
ibox Status Register —ISTAT .o.sc3 iy ook ci isos bee Ee ew Oe IN eS ed ee ws 5-18 
Icache Flush Register — IC_FLUSH.... 2... ce ee ee ee ens 5-21 
Icache Flush ASM Register -IC_FLUSH_ASM .................-0.20004; Vegi 5-21 
Clear Virtual-to-Physical Map Register-— CLR_MAP.......................00.. 5-21 
Sleep Mode Register—SLEEP ............. 2. eee eens 5-21 
Process Context Register -PCTX....... 0... 0... cece ee eee tenets 5-21 
Performance Counter Control Register—- PCTR_CTL................0..0000005 5-23 
Mbox: IPRS. 2 ther ide tee hee eth eee ee ete eu ea es ee te ee tee eo 5-25 
DTB Tag Array Write Registers 0 and 1- DTB_TAGO, DTB_TAG1............... 5-25 
DTB PTE Array Write Registers O and 1- DTB_PTE0O, DTB_PTE1............... 5-26 
DTB Alternate Processor Mode Register - DTB_ALTMODE..................... 5-26 
Dstream TB Invalidate All Process (ASM=0) Register- DTB_IAP ................ 5-27 
Dstream TB Invalidate All Register -DTB_IA............... 00.00... eee eee 5-27 
Dstream TB Invalidate Single Registers O and 1- DTB_ISO,1 ................... 5-27 
Dstream TB Address Space Number Registers 0 and 1-DTB_ASNO,1........... 5-28 
Memory Management Status Register—- MM_STAT............ 0.0.0.0 .0.00000. 5-28 
Mbox Control Register —M_CTL . cee scs ccc en ee dee ee ee cee ee ee eee ee wie ees 5-29 
Dcache Control Register-DC_CTL ..... 0.2... ce ee 5-30 
Deache Status Register - DC_STAT........ 0.2... 0... cece ee eee 5-31 
Cbox USMS and IP Re scseis ioe IVa oie Nien ane aegis 5-32 
Cbhox Data Register — C. DATA i: oaxc dose ein tninwig ste lr eee deen nns ah sas 5-33 
Cbox-Shitt Register C6 SHFT s<ide taeet bree so oe S cee ne eee ee ey 5-33 
Cbhox WRITE_ONCE Chain Description .........0 20.0.0... 0. cece eee ee 5-33 
Chox WRITE_MANY Chain Description ....... 0.0... 00. eee eens 5-38 
Cbox Read Register (IPR) Description ... 2.0... 0... cece eee eee 5—41 


Privileged Architecture Library Code 


PALCOGe- DeSChOliOn ajar cote owt dante seaeewk daieoy Niet ae be sa sansa oe 6-1 
PALMmode Environment 255.2005 Gn kee bo eee ee a Eee way eae Aa 6-2 
Required PALcode Function Codes ................. Sree tap ec alaal stanly: dicate ak atacee tare, 6-3 
Opcodes Reserved for PALcode.... 2.2... ee cece eee eee eens 6-3 
HW_LD Instruction..................... tat AI Sea sare. gaa nase Dyes ste Woe ait Baa tna rahe ine 6-3 
FW 2S TL INStiiCHOn sas eetda giacteh ao ew tea os see ea aah ae AE ee ele EN 6-4 
PW RET INStIChOn 6.255 hoa irae eb he aan ee EE beeen eee lead ake 6-5 
HW_MFPR and HW_MTPR Instructions ....... 0.0.0.0... 002 ec ce ee eee 6-6 
Internal Processor Register Access Mechanisms... .. 2... 0.5.0... cee ee te ee eee eee 6-7 
IPR: Scoreboard Bite: 2.5 Sehnert evens ietad ten See i ee ot ewes ies 6-8 
Hardware Structure of Explicitly Written IPRs... 2.0.0... ee eee 6-8 
Compag Confidential 


21264A Revision 1.1 — Subject To Change 


6.5.3 Hardware Structure of Implicitly Written IPRs ........... 0.0.0... 0.00 c ee ee eae 6-9 


6.5.4 IPP Access Orden csc nd nt fey tt Odi ea eel OV Sa aes Oh aweeataeonae ewes a6 6-9 
6.5.5 Correct Ordering of Explicit Writers Followed by Implicit Readers................. 6-10 
6.5.6 Correct Ordering of Explicit Readers Followed by Implicit Writers..............0... 6-11 
6.6 PALshadow: Regisiels nae a ee ne Pao eee Sie a Se a Sarthe ga is eae ees 6-11 
6.7 PALcode Emulation of the FPCR .. 0... . ce ee ne eee eens 6—11 
6.7.1 Slats PIAS Sie eres dens SBS ewe eE Shwe eee AS ad Laculad bee eee Rambrs 6-12 
6.7.2 MPEP GH 2 bicecos oiiatne bye eee wee Gareth Seed eee Dee tee EM nlgratdhal dea aes 6-12 
6.7.3 MISEPCR 5 osetia a uaa horn g Sieh ene wee clned a, dcerae de Pp rlereck, Saw 6-12 
6.8 PALCOOGENIY PONS: coin weer soaetabeees bara et cove Maeda baie weld wu shod 6-12 
6.8.1 CALL PAL Entry POS 6.6.54. 0.65085 Sach ie Oi ewoua deeds nw oes 6 Sure wie 6-12 
6.8.2 PALcode:Exception Entry Points 2 4 i355 % chet cae oe vbr e404 ke Gs $O05 Jaa ox 6-13 
6.9 Translation Buffer (TB) Fill Flows ...... 0.0.0.0... ce ce ee eee ee een eae 6-14 
6.9.1 DTS Fill coset hens yee ate ee koa eae CA ee ee oe he Ok ey oe 6-14 
6.9.2 BE 2 west eb opto eg Ria oud ete bat ouvencyaadethsee Ae cham A edt oe Vewaeils 6-16 
6.10 Penormmance: Counter Suppor 7.546304 tae ng eee oie hee ewe eee Bete 6-17 
6.10.1 Gerielal Precautions: <02cs 24 tenes ewhe sie oeohl as de Gade kene abe ewes 6-18 
6.10.2 Aggregate Mode Programming Guidelines ..... 2.0... 0.0.0... ce eee eee ee ee 6-18 
6.10.2.1 Aggregate Mode Precautions... 2. 04:22 4 2. ead oes Se eee be cae Wa ees 6-18 
6.10.2.2 ODEKAUON: Mons 5 oS Cece ge ROSS Hee gat one DN es AE a STON 6-18 
6.10.2.3 Aggregate Counting Mode Description... ........ 0.0.00... cee eee 6-19 
6.10.2.3.1 CYCIG-COUNUING 2 2c evnceu treet nder ee ob tet teen etek me eee Wes A 6-19 
6.10.2.3.2 Retired instructions cycles... 0.0... 0... ee tenes 6-19 
6.10.2.3.3 Bcache miss or long latency probes cycles... 1.0.2... 20.0... cee eee eee 6-20 
6.10.2.3.4 Mbox-réeplay traps cycles... 62i.c3.ci Seige ces Rae Ga da ears 6-20 
6.10.2.4 Counter Modes for Aggregate Mode. ...........-.. 0... cee eee eee eee 6-20 
6.10.3 Profile Me Mode Programming Guidelines .............. 0.0.0.0 2 ee eee eee eee 6—20 
6.10.3.1 ProfileMe Mode Precautions: «6 eee soy aie d eeeeeee sees sheen Ss oy bas 6-20 
6.10.3.2 OpPGrallolg iwc ee ewe Ade ee en Ss wea ae ea oes Agar oe a 6-20 
6.10.3.3 ProfileMe Counting Mode Description ........ 0... 00. cee eee eee 6-22 
6.10.3.3.1 CYCIE- COUMUNG so0-0s heck e4, Care sek tee St 2o See ee BESS See AeA eS 6-22 
6.10.3.3.2 Inum-retire delay CYCICS s6.04 uve ess ohne ede eS OH ESY Ue RSS 6-22 
6.10.3.3.3 Retired instructions CYCIES . osc cs seat ee eons eae ye 5 Ca SSS POE SUES ESS 6-23 
6.10.3.3.4 Bcache miss or long latency probes cycles..................-.00- Rae 6-23 
6.10.3.3.5 Mbox replay taps cycles’. 2.0: 5.2.5 tf 2 bn oe a hue wk ok HDS Sa 6-23 
6.10.3.4 Counter Modes for ProfileMe Mode ............ 0.20. cee eee ee eee eee ee 6-23 


7 = Initialization and Configuration 


7.1 Power-Up Reset Flow and the RESET_L and DCOK_H Pins........................ 7-1 
7.1.1 Power Sequencing and Reset State for Signal Pins ......................0008. 7-3 
7.1.2 Clock Forwarding and System Clock Ratio Configuration ...................008. 7—4 
7.1.3 PEL Ramo Ub. cc 2 on cer eek to ck ewe chee pens es Gee Pee ee ee ee ees 7-6 
7.1.4 BiST and SROM Load and the TestStat_H Pin... 0... 0. ee eee ee eee 7-6 
145 Clock Forward Reset and System Interface Initialization...................00.08, 7-7 
7.2 Fault Reset Flow c.cveiyiccwat bases neti eae Oath ed Ss. Saber eee MUS Re eos 7-8 
7.3 Energy Star Certification and Sleep Mode Flow... ... 0.6.6... . cece ee eee eens 7-9 
7.4 Vain Reset FIOW acacia ete mae SEG e ae Rae eet eee eae 7-11 
7.5 Atiay Initialization 2 c46 3s5 ak aan te ete cn 4 2k ewe ONS BRAS RE Aree oe whe PGi eee eS ae 7-12 
7.6 initialization: Mode Processing 4.6. ee. esas Cast ee ta koe enue ee tee eae as 7-12 
7.7 Extemal. Interiace Initialization: «640. ..ca8 haw Gena ed WERE AG on Ae eae a 7-14 
7.8 Internal Processor Register Power-Up Reset State ... 0.6... 66. eee eee ene 7-14 
7.9 IEEE 1149.7 Test Port Resé€tscsc a ccxs five awd ee eee iow nah so ae Ree eae os 7-16 
7.10 Reset State Machines i054. 4.2645 28 2 ead ta awe oe be ha wee dese end Ke ees @a.e8 7-16 
7.11 Phased-Lock Loop (PLL) Functional Description .. 0.0... 0. cee eee 7-19 
7.11.1 Differential Reference Clocks. .... 0.0... 0... cc ee eee eee tees - 7-19 
7.11.2 PLISOutput CIOCKS 0 ccd aed ca deaG bd 4bG Rn eos Hee See Re Ris oe Pao 7-19 


Compaq Confidential 
21264A Revision 1.1 - Subject To Change vii 


10 


11 


Vili 


7.11.2.1 IR a eter aS beet cart ore Te eign ase. eg ava deste Ged Nialth, gion Ih Pani rine wales aie wok 7-19 


7.11.2.2 Differential 21264A Clocks .... 0.2... cee cece e eens 7-19 
7.11.2.3 Nominal Operating Frequency ........... 0.0... eee e ee een een e eee e eres 7-20 
7.11.2.4 Power-Up/Reset Clocking... 5.50 425 nae ed Kaw bee vais Wedd NG ee wees 7-20 


Error Detection and Error Handling 


8.1 Data Ermor Comection God 3...5 sae cates cee Rath bee SE Se oP Le aE EER PDE Oh 8-2 
8.2 leache Data or Tag: Patity Ermofi.. .au 6c5 ccce se ee ee eho cert neki eens Vike edb es 8-2 
8.3 Deache lag Panty Enmore. 2.5c sacs nade ton ta Gale he ews Ges Rocha Ga here Ps . 8-2 
8.4 Dcache Data Single-Bit Correctable ECC Error ..... 2.20... 2. eee ees 8-3 
8.4.1 LOad INSUWHCUOR oak fee Sot rene wyace eet Mea need ee ae Gras bo be Beene 8-3 
8.4.2 Store Instruction (Quadword or Smaller)... 0.0.0... ce ee eee 8-4 
8.4.3 DGaAGhe: VICHIMNEXUACS 2 ocean ett «oes ee eee eee ee cena eS 8-4 
8.5 Deaché Store S6cOnd Emon 2. os an ee ov Ew otias ea SON Eas fe Si exw eeu hae as 8-4 
8.6 Dcache Duplicate Tag Parity Error... 0... cee nee nee 8—4 
8.7 Beache Tag Patty Bor ac cocci dies teeta baa eek ok oh es oa eos be dee Ss 8-5 
8.8 Controlling Bcache Block Parity Calculation ..... 0.0... 20.0... eee ce ee nee 8-5 
8.9 Bcache Data Single-Bit Correctable ECC Error .......... 0... 0... cece eee ee ee 8-5 
8.9.1 icacne-F ill from Beacne ancien pene tipo ae ew eee eer eames ees eee ee ae 8-5 
8.9.2 Deache: Fill from Beache o.oo nat baa s tate oe ohm se PA See Rew bacale Dera ae 8-6 
8.9.3 Beache:-Victini: Head 2.40255 eas Se es ta i A Se ee Sota ewe aaa 8-7 
8.9.3.1 Bcache Victim Read During a Dcache/Bcache Miss ....................0.. 8-7 
8.9.3.2 Bcache Victim Read During an ECB Instruction..................-.....005- 8-7 
8.10 Memory/System Port Single-Bit Data Correctable ECC Error....................0--- 8-7 
8.10.1 - cache: RL AOM MEMO <2cnc0co sole e? caceae yea ce tiene tee es eee sews 8-7 
8.10.2 Deache Milfrom Memory: i.-282.4 e153 G0Gc2<d se See east ieee a eee 8-8 
8.11 Bcache Data Single-Bit Correctable ECC Error ona Probe................. 0.02000. 8-9 
8.12 Double- Sit Fil PrrOMses ese ste ca ei eee teh eek eee eke celta eee oak 8-9 
8.13 EifOr Case SUMMA > oa seam ee oo bed boca tesa tee wa ddps Sa ats ae ae 8-10 


Electrical Data 


9.1 Elecincal CnaracteniSues «22 can teet gear w eee eel aw be SS Se Pe Tae a 9-1 
9.2 DC CharacliOnsuCs 41.6. cpio oewid how SOE eee Rohe ES 4 PAA EM eis de sell MER OES 9-2 
9.3 Power Supply Sequencing and Avoiding Potential Failure Mechanisms ............... 9-5 
9.4 AG ‘Cnaractenstics << xs hos te Mae de BRAS ERGO SEY CA OA ASA Ve ote REAGAN 9-6 


Thermal Management 


10.1 Operating Temperature 4.6 ec cet catas eee SSS4. eee ee Eee ee ys ry ee 10-1 
10.2 heat-Sink SPeCiICANONS °. <4 esse wa xcs oe eae oes Rae eka edeetAesued f4.92% oaks 10-3 
10.3 Thermal Design Considerations ........ 0... cece ce ee ee tent tee nee 10-6 


Testability and Diagnostics 


11.1 TQS PINS cc-uss Be ods Oar ae Pe ee Oh eee aT eae ee eye Oe eee at ee Ow ae 11-1 
11.2 SROM/Serial Diagnostic Terminal Port... 2... 0... cc ce eee eens 11-2 
11.2.1 SHOM Load Operation. <.52 fo05 2. oee boriag Rian Ohh eek hee ed ee pee ee 11-2 
11.2.2 Serial Terminal POM cows ce seh an cheers fei awe Rae ered Ree hee Sakis 11-2 
11.3 (EBS 149) Porn. 250 ar oa we Ties oie dah a deh Maha & Potro hR ae BY wt Sew eed ata 11-3 
TA. “TestStae A Pin soma g act anes vege mde erase ener eared hans 11-4 
41.5 Power-Up Self-Test and Initialization ........0 0.0.0... 0002 ce eee eee a 11-5 
11.5.1 Bult Sele Tesh once vty eco dead oo ate rele elas aaa’ SEW hd eel eeEw ee Sows 11-5 


Compaq Confidential 
21264A Revision 1.1 ~ Subject To Change 


11.5.2 SHOM INiializalion...004.355 saa gee PsA ede nh Ee SRA Sd be wage wa ROSES 11-5 
11.5.2.1 Serial Instruction Cache Load Operation ..............0 0.0... cee eee eee 11-6 
11.6 Notes on IEEE 1149.1 Operation and Compliance .............0...0 00.000. 0 ce eens 11-7 
A_ Alpha Instruction Set 
A.1 Alpha Instruction SUMMAly ¢.<i¢ wis bo Foes eater Re vee Rey EDR e Mme wUE ees A-1 
A.2 Reseived OpCones 33 opacities on aso cdew 1 eae RE ee ewe FAS eee eee asa we A-8 
A.2.1 Opcodes Reserved for Compaq. ... 2... 0.0... cece cece eee eee eens A-8 
A.2.2 Opcodes Reserved for PALcode ..... 2... 0... eee cc ce eee eee enee A-9 
A.3 IEEE Floating-Point Instructions ... 0.0.00... ee cc ee eee eens A-9 
A.4 VAX Floating-Point Instructions... 0.0.0.0... ce eens A-11 
A.5 independent Floating-Point Instructions ........0.0..0 0.002 ee ee ee eens A-11 
A.6 CP CODE SPOT eth rd Maes as png eso 'G Sat ie Saas Gig. sal ve a SiN WE Pe “o A eR A-12 
A.7 Required PALcode Function Codes ............ 0. ccc ee ete eee ne eeee A-13 
A.8 IEEE Fleating-Pomt Contonmance sc a¢5.teiscceee souk oie dee eRe ee eae A-14 
B 21264A Boundary-Scan Register 
B.1 BOunGary-SCan HeGisien va so3 end 55 Fe ea dade sp Saati dey bam op both oo Ex B-1 
B.1.1 BSDL Description of the Alpha 21264A Boundary-Scan Register................. . B-1 
C Serial lcache Load Predecode Values 
D PALcode Restrictions and Guidelines 
D.1 Restriction 1 : Reset Sequence Required by Retire Logic and Mapper............... D—1 
D.2 Restriction 2: No Multiple Writers to IPRs in Same Scoreboard Group............... D-8 
D.3 Restriction 4 : No Writers and Readers to IPRs in Same Scoreboard Group .......... D-8 
D.4 Guideline 6 : Avoid Consecutive Read-Modify-Write-Read-Modify-Write............ D-9 
D.5 Restriction 7 : Replay Trap, Interrupt Code Sequence, and STF/ITTOF ............... D-9 
D.6 Restriction 9 : PALmode Istream Address Ranges .......... 0.0.0.0 cee eee eee D—10 
D.7 Restriction 10: Duplicate IPR Mode Bits .......... 0.0... eee ee D—10 
D.8 Restriction 11: ibox IPR Update Synchronization .... 0.2.2... 20... eee eee eee D-11 
D.9 Restriction 12: MFPR of Implicitly-Written IPRs EXC_ADDR, IVA_FORM, and EXC_SUM D—-11 
D.10 Restriction 13 : OTB Fill Flow Collision... 0... 2... 0... ee ee eee eee D—11 
D.11 Restriction: 14 CAWORET sa55 4 va pele Pade ae Pek heeds Geeta wr eae ee D—11 
D.12: ~-Guideline 167 JISR-BAD.-VA.« oi. is.c0200s Keeee aod Sa wk Bee VRE GS ON eee D-t2 
D.13 Restriction 17: MTPR to DTB_TAGO/DTB_PTEO/DTB_TAG1/DTB_PTE1 ............. D-12 
D.14 Restriction18: NoFP Operates, FP ConditionalBranches, FTOI,orSTFinSameFetchBlockasHW_MTPR 
Pb acaer leis th tao tec tease eA aaa, ial ac lnn Sraraig segs er lagi a wae Rec igh Banc Mea eee at Rha Neg WT sieee a fy S D-12 
D.15 Restriction 19: HW_RET/STALL After Updating the FPCR by way of MT_FPCR in PALmode D-—12 
D.16 Guideline 20:1! CTL{[SBE] Stream Buffer Enable................. 0.2. eee ee eee, D—-12 
D.17 Restriction 21: HW_RET/STALL After HW_MTPR ASNO/ASN1..................005. D-12 
D.18 Restriction 22: HW_RET/STALL After HW_MTPR ISO/IS1...................020008, D-13 
D.19 Restriction 23: HW_ST/P/CONDITIONAL Does Not Clear the Lock Flag............... D—13 
D.20  _— Restriction 24: HW_RET/STALL After HW_MTPR IC_FLUSH, IC_FLUSH_ASM, CLEAR_MAP 
Bia D eee ceed Meats oe het at ke 8 Bat est cage A ack el EA Ua Farag Ga ae ay aa ee ae D-14 
D.21 Restriction 25: HW_MTPR ITB_IA After Reset... 0.0... 2... eee eee eee eee D-14 
D.22 Guideline 26: Conditional Branches in PALcode .... 2... 0... 0. 0c ce cee eee eee D—14 
D.23 Restriction 27: Reset of ‘Force-Fail Lock Flag’ State in PALcode................-.... D-15 
D.24 Restriction 28: Enforce Ordering Between IPRs Implicitly Written by Loads and Subsequent Loads 
Betas pena Ek oe a uefa Lak Sal aurnnd Riek atta g Gets Sika h ane due athatt ind Go ier aie ae gt Gis aoe oy D-15 
D.25 Guideline 29 : JSR, JMP, RET, and JSR_COR in PALcode.................5. 0-200. D-15 


Compaq Confidential 


21264A Revision 1.1 - Subject To Change ix 


D.26 
D.27 
D.28 
D.29 
D.30 
D.31 
D.32 
D.33 
D.34 
D.35 
D.36 


Restriction 30: 


Restriction 31 


Restriction 32 : 
Restriction 33 : 
Restriction 34 : 
Guideline 35: 


HW_MTPR and HW_MFPR to the Cbox CSR....................... D-15 


SC TEVA 48) Update so vues x83 oh 55h heh lad boo nbs eee RS hdes 4 D-16 
POTRUCML Update 2% u35.ax iad eae Sire Cae ae tis beg ee ak te ro D-~-16 
HW:.LD: Prysi¢al/ Lock’ US6...6 aissy:visdas-d aces sawed Sa eed bee cee D-—17 
Writing Multiple ITB Entries in the Same PALcode Flow............... D-17 


AW OINT CUR Update 662 uily ese pete s ah aes 74ers dain D-17 


Restriction 36 : Updating ILCTL[SDE]........................06. Ree eT ee D-17 
Restriction 37 : Updating VA_CTL[VA_48] .... 0.0.0... 00. ccc cece eee eee eens D~17 
Restriction 38 : Updating PCTR_CTL... 0... ee eens D-17 
Guideline 39: Writing Multiple DTB Entries in the Same PAL Flow.................... D-18 
Restriction 40: Scrubbing a Single-Bit Error... 2.0.2... cette es D-18 


E 21264A-to-Bcache Pin Interconnections 


E.1 
E.2 
E.3 


Glossary 


Index 


Forwarding Clock Pin GroupingS......... 0.0... cece eee etn t ene aes E-1 
Late-Write Non-Bursting SSRAMS ... 0.0.0.0... 0... cece eee eee eee nes E-2 
Dual-Data Rate SSHAMS 6 sak sion esigts cuted hale age Ae ay WES EIS OM eR Sone eka > E-3 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change 


Figures 


2-1 212644 Block DiaGiaM s.h.0:5 sds Meee ek ke aS Give pw OEE es Oe Oe ae EES REO? 
2-2 BEANCH Predictor <nc4n ocd won ce ea wnats So taaees Seine Swed Ae epee aha nes Wee © Saw he MY 
2-3 LOCAIPIEGiClOl ces eb eds ia SSS ES eB GA Sen hee a aia Bee Ba OR BS ES 
2-4 Global: Predictoli ws26 sk08$2e sede eases et Sa Whee Oe OES GA e eed ka ee EES 
2-5 Choice Prediciol 2.2 badensa ek Sout cog ace ikea Sader Oe ea raed ak ote 
2-6 Integer Execution Unit—Clusters 0 and 1......... 00. ccc eee ene eens 
2-7 Floating-Point: Execution Units sc occas ats ota ae aca baw ee ate ed Soran He eG Berner 
2-8 Pip6inio Organization c.0.65 cna ted Sais Cada dea aw neh ba bee Mae GORA 
2-9 Pipeline Timing for Integer Load Instructions .......... 00... . cece eee eee ee eee 
2-10 Pipeline Timing for Floating-Point Load Instructions... .. 0.0.2... 20.0... cece eee ee eee 
2-11 Floating-Point Control Register... 0... 0... 0. cc cece eee te eee eee re eee 
2-12 Typical Uniprocessor Configuration ..... 0... 0. eee ect eee ene ee eeee 
2-13 Typical Multiprocessor Configuration ........ 0... . eee eee eee eens 
3~1 21264A Microprocessor Logic Symbol ........... 0... eee eee eee ee eee 
3-2 Package DIMGNSIONS....324.030025 conde adv dadantaved ale Ooey sabe bees TRA Sw eds 
3-3 21Z64A Top View {Pin DOWN) 9.6 ici cere Sates eG ee Shen Ce waeet ee Ss 
3-4 212644 Bottom View (PIN UD). <0. 2s ccc cau oars Bee e ete eae s dowseawk Ceaa wens 
4-1 21264A System and Beache Interfaces ........ 2.0... ce cc ee eee 
4-2 21264A Bcache Interface Signals... 2.0... 0... ee eee ee eee eee 
4-3 Cache Subset Hierarchy «2.6 gscaw oi ga cee ead coe She eea ees aoe eee Res aandiyG ee es 
44 Systenr interlace: Signals. yc cedaee eee sears ees ie es eam seo eaten eines oe 
4-5 Fast Transfer Timing Example ........... 0... ccc cc eee ete eee nets 
4-6 SysFillValid_L Timing .......... 000.0000 c ccc eee ee ee eC eer eC ee eee 
5-1 Cycle Counter Hegistel sc hove we Saw Bs eee ee Oe ees st PAA oa aaa 
5-2 Cycle Counter Control Register... 2... 0... ne eee eee eens 
5-3 Virtual Address Register 3:06: Fee pets Du he SW EE SESE eee EGS os 
5-4 Virtual Address Control Register... 0... 2.0... cece ec eee ete ee eens 
5-5 Virtual Address Format Register (VA_48 = 0, VA_LFORM_32 =0).................... 
5-6 Virtual Address Format Register (VA_48 = 1, VALFORM_32 =0)............ i acdemnlces, 
5-7 Virtual Address Format Register (VA_48 = 0, VA_LFORM_32 = 1).................... 
5-8 ITB Tag Array Write Register ... 2.0... . cc ee eee eee tee eees 
5-9 ITB PTE Array Write Register... 0.0.0.2... cece ccc cet eee ete e tenet nes 
5-10 ‘ITB Invalidate Single Register..... 0.0... 2c cece ee eens 
5-11 ProfleMe: PC AeGISI6E oo ood eee hie ss Dee acct eendn Pa aT S BWR wh eee be eee 
5-12 Exception Address Register .... 0.0.2.2... .. ccc ec eee ee eee tree tee eee 
5-13 Instruction Virtual Address Format Register (VA_48 = 0, VA_LFORM_32 =0)........... 
5-14 Instruction Virtual Address Format Register (VA_48 = 1, VALFORM_32 =0)........... 
5-15 = Instruction Virtual Address Format Register (VA_48 = 0, VA_LFORM_32 =1)........... 
5-16 _ Interrupt Enable and Current Processor Mode Register.................0 eee eee se 
5-17 Software Interrupt Request Register... 0.0.0... ec eee eee te eens 
5-18 = Interrupt Summary Register .......... 0... ce ee ce ee ee ee eee ene eees 
5-19 Hardware Interrupt Clear Register ... 2... 0.0... c ec cee ee ete teens 
5-20. Exception Summary Register :.::. ac. hos Gee a ale Pee eee bee eee awa ee ee ee ees 
5-21 PAL Base AGGISIEF 5. sacs ste netet con eee eh etanu see bate kaw eeadbwas 
S22: * Ibox Control Registets 46. 264.620 esc be dea e eee ee Oe twee SAGs ROSA eee, 
5-23 box Status Register ................. Reh Casita AGI ee a etna Sg leaveite Raised 
5-24 Process Context RegISle! .ociies died kata doy eae be ke erin eee eS 4S e ewe a EC EE 
5-25 Performance Counter Control Register... 0... ce ee eee teen ee 
5-26  DTB Tag Array Write Registers O and 1...... 2... eee ce ee ee te eae 
5-27 DTBPTE Array Write Registers O and 1............ cee ccc eee eee teens 
5-28 DTB Alternate Processor Mode Register ............. 0.0 c eee eee eee ee tees 
5-29  Dstream Translation Buffer Invalidate Single Registers ........... 0.0... eee ee eee ee 
5-30 Dstream Translation Buffer Address Space Number Registers 0 and 1................ 
6-31 Memory Management Status Register .... 0.20.20... eee eee ne 
3-32 “Mbox Control Registet. iis eran tatwies eae eee ines ae hee ke ewe 
5-33. — Deache Control Register oo vais be dete ev de dete Eee Ha RE SEES en eee ONS Saws 


Compag Confidential 
21264A Revision 1.1 - Subject To Change 


2-3 
2-4 
2-4 
2-5 
2-5 
2-9 

2-10 

2-14 

2-24 

2-25 

2-36 

2-39 

2-39 
3-2 

3-17 

3-18 

3-19 
4-3 
4-7 
4-8 

4-16 

4-31 


5-7 
5-7 


5-8 
5-9 
5-9 
5-9 
5-10 
5-11 


5-12 
5-14 
5-15 
5-16 
5-19 


5-23 
5-25 
5-26 
5~26 
5-27 
5-28 
5-28 


5-31 


Xi 


xii 


5-34 


7-3 


10-1 
10-2 
10-3 
11-1 
11-2 
11-3 


Deache-Status Recister. 0 oc epe yes See OS eee Re Hila anes Ped ou eases ease es 5-32 


Cbhox: Data HeGister coics cree eae Hue hei a ee a ea a ae PROG ee eae Pea 5-33 
DOC Shit MOGISISl s a:c ta wis ttoig: rae atstanetautiacs 40S gine enk Waren bas See BOE Me aie tg 5-33 
WRITE_MANY Chain Write Transaction Example ................00 000000. eee eee 5-39 
HW. ED lnstrichon Fommat:;. : <4 os5 seen e ket at tan bade ee be Pees Se wee) SeES ou be 6—4 
HW._-ST Jnstruction. Format « <6 scc6b4 46 ee i Pah igsek eee bade deaeseues ga denies 6—4 
HW RET Instruction FOnnatics ociecde: Sk ockd 6a Chel dann Ge So ate Seta eed eel oe wed 6-6 
HW_MFPR and HW_MTPR Instructions Format...............0.0. 0.0... e eee eee 6-6 
Single-Miss DTB Instructions Flow Example............ 0... cece cee ee eee 6-14 
ITB Miss Instructions Flow Example ........... 0... cee cee ee ee eee eens 6-16 
Power-Up Timing Sequence .... 0... eee ce tee e ete eee teenies 7-3 
Fault Reset Sequence of Operation .......... 0... ee ee eee teens 7-9 
Sleep Mode Sequence of Operation ......... 0... cee cee eens 7-11 
Example for Initializing Beaches «= sss ee holes, Ge ve Pee eae eee e oA 7-13 
21264A Reset State Machine State Diagram ......... 0... . ee es 7-17 
TPT FiGAl SINK soc ee G seuss Sens eee xe wS Seay bens tee eserrad peed Sait ameas 10-3 
‘yee 2 Heal Sink .c04.2.65, evounten at ok Cae are esas AN Re TES aie oe awthak 10—4 
TDG 3 POA SINK aa tin cae Hearn nee ee BA wn AOC hae Np ek Meade pas gee 10-5 
TestStat_H Pin Timing During Power-Up Built-In Self-Test (BIST) ................... 11-5 
TestStat_H Pin Timing During Built-in Self-Initialization (BIS!) ....................0.. 11-5 
SROM: Content Map 3.0: .3.00e ctw views Sie en mad Sees aon ow we bal Pe wate Gauss 11-6 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change 


1-1 Integer Data Types 


oc 


2-1 Pipeline Abort Delay (GCLK Cycles)... 00... 0... eee cece eee eens 
2-2 Instruction Name, Pipeline, and Types ......... 0.0.0.0... ccc eee eee ee eee ee aes 
2-3 Instruction Group Definitions and Pipeline Unit ...........0.0.....0. 000 cc ee eee eee 
2-4 Instruction Class Latency in Cycles ..... 0... 0... ccc cc cee teen eee ees 
2-5 Minimum Retire Latencies for Instruction Classes... 0... 0... ec cee eee 
2-6 Instructions Retired Without Execution .... 0.0.0... cece cece eee teens 
2-7 Rules for I/O Address Space Load Instruction DataMerging..................000085 
2-8 Rules for I/O Address Space Store Instruction Data Merging...................0000- 
2-9 MAP. Merging RUICS: 2.4 sas aoe inca ve aiik ie awa Grad aoe each Ae OS BtG ara Deer Pw Reo Rad 
2-10 Memory Reference Ordering ............ 0... cece ee eee eee een nen etee 
2-11 VO-Retetence Orden: oi wca sr Gutindsd pesees Pe abed Jawesncd saanled eeheotsaws 
2-12  TBFill Flow Example Sequence 1 .......... 00... ce cc ccc tenet eae ees 


2-13 TBFill Flow Example Sequence 2 .......... 0.0... cc cee ence tenes 
2-14 Floating-Point Control Register Fields 


i J 


2315; “2I264RAMASK Values fcc i wb elee oi toaw iol oe te Seat d Lan esa use tie baw ede 
2-16. ~-AMASK Bit ASSignMO@NtS .i2662 oe nad ee ed eee dee ee So a be awe ease 
3-1 signal. Cin. Types DENNIIONS «2.5. shy tons ceee ad ede she ab aakies Boeted Soaetidwaew hen 
3-2 21264A Signal DescnpliOns: cic 45, oo ree ie So Phe Ye Oe RE ee See Bae b ad sehen 
3-3 21264A Signal Descriptions by Function ......... 0.00.02. eee eee 
3-4 Pin List Sorted by Signal Name ...... 0... ee eee eee eens 
3-5 Fin: List Soried by PGA LOCAUON: ae eso cnet ete Gun wa. musa ahaa Aled aetna 
3-6 Ground and Power (VSS and VDD) Pin List ........ 20.0... 00... ccc cee ee eee 
4-1 Translation of Internal References to External Interface Reference................... 
4-2 21264A-Supported Cache Block States ... 2... 0.0... ee ees 
4-3 Cache Block State Transitions ..........0 0.0... 0.0 eee eee eens 
44 System Responses to 21264A Commands ............. 2.0 een eee 
4-5 System Responses to 21264A Commands and 21264A Reactions .................. 
4-6 OVSIOMU POM PINS so cee he eee ha st Mild dt Mg le ee ae eee oh Macatawa nara d 
4-7 Programming Values for System Interface Clocks ...........0 0. cc cee cee eee eae 
4-8 Program Values for Data-Sample/Drive CSRs... 1.1... ee ee eee 
4-9 Forwarded Clocks and Frame Clock Ratio. .......... 0... eee ee eee 
4-10 Bank Interleave on Cache Block Boundary Mode of Operation ..................... 
4-11 Pages Hil Mode of Operation ec sss at avie ciel nase Rae oe ES ae ee 
4-12 21264A-to-System Command Fields Definitions (Sheet 1 of 2) .................... 
4-13 Maximum Physical Address for Short Bus Format .............. 00.0. e eee eens 
4-14 21264A-to-System Commands Descriptions ........... 0.0.2... cece eee , 
4-15 Programming INVAL_TO_DIRTY_ENABLE[1:0]......... 20.0.0... ce cee eee ee ee eee 
4-16 Programming SET_DIRTY¥_ENABLE[20] «5.44 as cee eects yews iewewe be ee eee 
4-17 21264A ProbeResponse Command ............. 20. e eee eee cee nett ee te eens 
4-18 | ProbeResponse Fields Descriptions... ..........0.0. cee e cece eee (Sheet 1 of 2) 
4-19  System-to-21264A Probe Commands............ 2.0... cece eee eee ee eee ences 
4-20 System-to-21264A Probe Commands Fields Descriptions............. (Sheet 1 of 2) 
4-21 Data Movement Selection by Probe [4:3] ....... 0.0... eee ce ee eee eee 
4-22 Next Cache Block State Selection by Probe [2:0]........ 0... cece eee eee eee eee ee 
4-23 Data Transfer Command Format ............... 2c cece ee eee teens 
4-24 SysDc[4:0] Field Description ....... 0.0... ccc cee ee eee eet eee ee teens 
4-25 SYSCLK Cycles Between SysAddOut and SysData. .......... 0... cee eee ee eee 
4-26 Cbhox CSR SYSDC_DELAY[4:0] Examples ............. 2... e ce eee cee ene 
4-27" > FOUL AINIAG EXAMBICS: ire. 6 osse. 2.0 o ae e S Maen ave aS Sad Radel ae a aoe BS Sea Grate as 
4-28. «Data Wrapping Rules isc. tack beat eee eee ees rere Me eae Hae a a ae 
4-29 System Wrap and Deliver Data.......... 0... ec ee ee tte teen 
4-30: - “Wrap Interleave Order; 2. osc coon eaakeiwe od te a Ce nee ve See ew AN Oe 
4-31 | Wrap Order for Double-Pumped Data Transfers .................-. (Sheet 1 of 2) 
4-32 21264A Commands with NXM Addresses and System Response ................++. 
4-33  21264A Response to System Probe and In-Flight Command Interaction............... 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change 


1-2 
2-16 
2-17 
2-18 
2-20 
2-21 


2-27 


2-34 


Xi 


4-34 Rules for System Control of Cache Status Update Order ...............00.. 000.000. 4-40 


4-35 Range of Maximum Bcache Clock Ratios ........... 0... ec ee eee 4-41 
4-36. -.BCache FO PINS 2x6 4 Sek eee eadede we Be wha dewey Chee eo teetee ores 4-42 
4-37. BC_CPU_CLK_DELAY[1:0] Values .. 0.2.0... kee cece eee 4-44 
4-38 BG CLK DELAVITO! Values... 2-03 toneas bets hana oe bere e how eeke se shee coos 4-44 
4-39 Program Values to Set the Cache Clock Period (Single-Data)....................... 4-45 
4—40 Program Values to Set the Cache Clock Period (Dual-Data Rate) .................0. 4-45 
4-41 Data-Sample/Drive Cbox CSRS ..... 0... ec eee teen e ete tenes 4—46 
4-42 Programming the Bcache to Support Each Size of the Bcache ...................... 4-49 
4-43 Programming the Bcache Control Pins... 0.0.0.0... ee eee teenies 4-50 
4-44 Control Pin Assertion for RAM_TYPEA........ 2.2... ccc cee ee eee ees 4-50 
4-45 Control Pin Assertion for RAM_TYPE BB... 2... eee cece ee eee eens 4-50 
4—46 Control Pin Assertion for RAM_TYPE C ... 1. ccc eens 4-51 
4-47 Control Pin Assertion for RAM_TYPE D..... 0.0... cece eens 4-51 
5-1 intemal: Processor: ReEgisletsS: 6 ...i osc dh tee ee eae hel wateeda ded Ree ee ewevaa eee 5—1 
5-2 Cycle Counter Control Register Fields Description.............. 0.0.00 cece eee ees 5—4 
5-3 Virtual Address Control Register Fields Description ...0...... 0.0... c eee eee 5-5 
5-4 ProfileMe PC Fields Description .. 0.0.0.2... 0.0... ccc ete teen eeeeee 5-8 
5-5 IER_CM Register Fields Description ........ 2... 0... cc eee ee eens 5-10 
5-6 Software Interrupt Request Register Fields Description... .......... 0.2.0.0. eee eee 5-11 
5-7 Interrupt Summary Register Fields Description ............. 0.0.0. cc cee eee ees 5-12 
5-8 Hardware Interrupt Clear Register Fields Description............ 20.00.00. c eee ee 5-13 
5-9 Exception Summary Register Fields Description ............ 0.0... cece eee eee ee 5-14 
5-10 PAL Base Register Fields Description ........... 0.2... eee ce ee eee 5-15 
5-11 Ibox Control Register Fields Description ......... 0... 0. cece eee ee eee 5-16 
5-12 box Status Register Fields Description ............ 0... cece cee cee eee 5-19 
5-13 IPR Index Bits and Register Fields ............ 0. ccc eee eens 5-21 
5-14 Process Context Register Fields Description .......... 0.0.0... ccc eee ee 5-22 
5-15 | Performance Counter Control Register Fields Description ................200 0 cee ee 5-23 
5-16 Performance Counter Control Register Input Select Fields .....................00.. 5~25 
5-17 DTB Alternate Processor Mode Register Fields Description ..................-2..... 5-26 
5-18 | Memory Management Status Register Fields Description .................-.2...02. 5-28 
5-19 | Mbox Control Register Fields Description. .... 0.20... 0. eee eee _ 5-30 
5-20 Decache Control Register Fields Description .......... 0.0... 0. eee eee eee §-31 
5-21 Dcache Status Register Fields Description.............. 00.00. cee ee eee 5-32 
5-22 Cbox Data Register Fields Description ........... 0.0... ccc cee eee eee 5-33 
5-23  Cbox Shift Register Fields Description ......... 2.0.0... 0c cece eee teenies 5-33 
5-24 Cbhox WRITE_ONCE Chain Order .......... Baa Nee ade, sel elec dads WIP oat lads tok cease 5-34 
5-25 Cbhox WRITE_MANY Chain Order ... 0.0.00... 0. ccc cece et eee eens 5-39 
5-26 Cbox Read IPR Fields Description ........ 0.0... ec tenet ees 5-41 
6-1 Required PALcode Function Codes .............. 0. cece cece ee eee eee eee 6-3 
6-2 Opcodes Reserved for PALcode. .. 1... 0... cece eee nee eee ne eeee - 6-3 
6-3 HW_LD Instruction Fields Descriptions ........... 0... cece ee eee ene ees 6—4 
6-4 HW_ST Instruction Fields Descriptions ........... 0... cece eee teens 6-5 
6-5 HW_RET Instruction Fields Descriptions ................0 cece eee eee nips Sones eas 6-6 
6-6 HW_MFPR and HW_MTPR Instructions Fields Descriptions ..................0 ee eee 6-7 
6-7 Paired Instruction Fetch Order o.62560.caru Pew t eeeaeticoueeee shoo eee eee es 6-9 
6-8 PALcode Exception Entry Locations .......... 0... ccc ec ee eee te eens 6-13 
6-9 IPRs Used for Performance Counter Support... 0.0.0.0... 0.00. ce eee ee tenes 6-17 
6-10 Aggregate Mode Returned IPR Contents ..... 0.20... eee ee ee 6-19 
6-11 Aggregate Mode Performance Counter IPR Input Select Fields...................... 6-20 
6-12. ‘CMOV Decomposed oie scod na cee eek eres wow a wee aes eae on ieee eo Rist Adee 6-20 
6-13  ProfileMe Mode Returned IPR Contents ........ 0.0... cee eee ee eet eee 6-22 
6-14 ProfileMe Mode PCTR_CTL Input Select Fields........ 0... 0.2... 0c e ee eee ees 6-23 
7-1 21264A Reset State Machine Major Operations... ............ 2.00 cee ee eee ee eee 7-1 
7-2 Sighal Pin. Reset Stale cic ands ore aba oe ee ara ee Sa eee Te ae eS 7-3 
7-3 Pin Signal Names and Initialization State .... 2.2... eee eens 7-5 

7-4 Power-Up Flow Signals and Their Constraints ........... 00-2... e eee eee 7-7 
7-5 Effect-on [PRs After Fault Reset: cei sactioe tan neiGG 44 eo ee bees ana eh aseseees 7-8 

Compag Confidential 


xiv 21264A Revision 1.1 — Subject To Change 


7-6 Effect on IPRs After Transition Through Sleep Mode ............... 0000 e cece eee 


7-7 Signals and Constraints for the Sleep Mode Sequence ...............00 0.0 e cee uee 
7-8 Effect on IPRs After Warm Reset ......... 0... ccc ccc ee cee eee eee neues 
7-9 WRITE_MANY Chain CSR Values for Bcache Initialization. .............0......0.0008- 
7-10 Internal Processor Registers at Power-Up Reset State ........... 0.0.00... cece eee 
7-11 21264A Reset State Machine State Descriptions ........... 0.0... cc cece eee 
7-12 Differential Reference Clock Frequencies in Full-Speed Lock .................000005 
8-1 21264A Error Detection Mechanisms .............. 0. cee eee eee nee eee eens 
8-2 64-Bit Data and Check Bit ECC Code............... 0... eee nyieadebalewate ces 
8-3 Error Case Summaly « 2...6%ca sae Sel whe pumas ted, Oh oe eet we te eG de wa ees 
9-1 Maximum Electrical Ratings ........... 2. cee cee ene held Se ah age 
9-2 SiQhal TYPOS: ait cine Va Stee auates aia se eta tta eae ie aos aly Seed Seabees 
9-3 VDD (1 DG POWER) entice ce eiaate Lays tee ona eee wee ae eS 
9-4 Input DC. Reference Pin VDC REF) 3. .:.0seseses eden Ai wana oi eda ba es cab as 0SS 
9-5 Input Differential Amplifier Receiver (LDA). ......... 0.0... cece cence eens 
9-6 Input Differential Amplifier Clock Receiver (ILDA_CLK) ........... 2... ...2. 0c eee eee 
9-7 Pin Type: Open-Drain Output Driver (OLOD) .........0 0.00... cee ee eee 
9-8 Bidirectional, Differential Amplifier Receiver, Open-Drain Output Driver (B_DA_OD) ..... 
9-9 Pin Type: Open-Drain Driver for Test Pins (OLOD_TP) ........ 0.0.2.0... cece eee ee 
9-10 ‘Bidirectional, Differential Amplifier Receiver, Push-Pull Output Driver (B_DA_PP) ....... 
9-11 Push- Pull Output Onvet (OPP) acca et a ee he Gedo ani ene lbw Mee Lovage Sake 
9-12  Push-Pull Output Clock Driver (O_PP_CLK)......... 0.0.0... cece eee eee eee 
S-i3.. “AG Specifications: nus cscagded gana aeee nese ee pucks Moe aa ere ae wen oe 
10-1 Operating Temperature at Heat Sink Center (Tc)... 0.0.2... 0... ccc ce eee eee eee 
10-2 qca at Various Airflows for 21264A........0 0.0.0... 2c cece Pe rr ere ee . 
10-3 Maximum Ta for 21264A @ 600 MHz and @ 2.0 V with Various Airflows.............. 
10-4 Maximum Ta for 21264A @ 667 MHz and @ 2.0 V with Various Airflows .............. 
10-5 Maximum Ta for 21264A @ 700 MHz and @ 2.0 V with Various Airflows .............. 
10-6 Maximum Ta for 21264A @ 733 MHz and @ 2.0 V with Various Airflows.............. 
10-7 Maximum Ta for 21264A @ 750 MHz and @ 2.0 V with Various Airflows.............. 
11-1 Dedicated Test. POM Pins 225 ae whats hee es ey hodaw ehe eee eGR Ae wnadses 
11-2 IEEE 1149.1 Instructions and Opcodes ............... BREESE CADE SAE AOR 
11-3. TAP Controller State Machine.... 0.2.2.0... 0 ccc eee eee eee ees 
11-4 _— Ieache Bit Fields inan SROM Line .......... 0... cee cece eee eee nee 
A-1 Instruction Format and Opcode Notation ......... 0... 2. ee ee ee ee eee 
A-2 Architecture WNSHUCHONG ree. va 4. eA nae awa Reds ka Rew Saw aeRO RS eS We dad aes 
A-3 Opcodes Reserved for Compaq ..... 0.6... 6. ce ce ee ence eee tenet cease 
A-4 Opcodes Reserved for PALcode.... 0... 0... ccc cece eee e ene eee 
A-5 IEEE Floating-Point Instruction Function Codes ............. 00... c eee ee eee ees 
A-6 VAX Floating-Point Instruction Function Codes ........ 0.2... 0. cee ce eee ce eens 
A-7 Independent Floating-Point Instruction Function Codes ............ 0.00. e eee ee eee 
A-8 Opcode SUOMMALY::. 4 ssa ee. poate et ates cer ae Meee laWeee cleo hee ese ee oases 
A-9 Key to Opcode Summary Used in Table A-8 ... 0... 0... ccc cee eee eee 
A-10 Required PALcode Function Codes .......... 0... ccc cece ce tee te te eee e eee 
A-11 Exceptional Input and Output Conditions ........ 20... eee ce ee ee ees 
E-1 Beache Forwarding Clock Pin Groupings ......... 0... 22. e eect eee teen eee eee : 
E-2 Late-Write Non-Bursting SSRAMs Data Pin Usage ........ 02... cee eee eee eee 
E-3 Late-Write Non-Bursting SSRAMs Tag Pin Usage .......... 0.0 cece cee ee eee eee 
E-4 Dual-Data Rate SSRAM Data Pin Usage ......... 0... cece ee eee tee tees 
E-5 Dual-Data Rate SSRAMs Tag Pin Usage. ..... 0.0... cece ccc ee ene 
Compaq Confidential 


21264A Revision 1.1 — Subject To Change 


10-4 


10-2 
10-2 
10-2 
10-2 
11-1 
11-3 
11-4 
11-7 

A-1 

A-2 


A-9 
A-9 
A-11 
A-12 
A-12 
A-13 
A-13 
A-15 
E-1 
E-2 
E-2 
E-3 
E-4 


XY 





Audience 
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Preface 


This specification is for system designers and programmers who use the Alpha 21264A 
microprocessor (referred to as the 21264A). 


This specification contains the following chapters and appendixes: 


Chapter 1, Introduction, introduces the 21264A and provides an overview of the Alpha 
architecture. . 


Chapter 2, Internal Architecture, describes the major hardware functions and the inter- 
nal chip architecture. It describes performance measurement facilities, coding miles, and 
design examples. 


Chapter 3, Hardware Interface, lists and describes the internal hardware interface sig- 
nals, and provides mechanical data and packaging information, including signal pin 
lists. — 


Chapter 4, Cache and External Interfaces, describes the external bus functions and 
transactions, lists bus commands, and describes the clock functions. 


Chapter 5, Internal Processor Registers, lists and describes the internal processor regis- 
ter set. 


Chapter 6, Privileged Architecture Library Code, describes the privileged architecture 
library code (PALcode). 


Chapter 7, Initialization and Configuration, describes the initialization and configura- 
tion sequence. 


Chapter 8, Error Detection and Error Handling, describes error detection and error han- 


dling. 


Chapter 9, Electrical Data, provides electrical data and describes signal integrity issues. 
Chapter 10, Thermal Management, provides information about thermal management. 
Chapter 11, Testability and Diagnostics, describes chip and system testability features. 
Appendix A, Alpha Instruction Set, summarizes the Alpha instruction set. 

Appendix B, 21264A Boundary-Scan Register, presents the BSDL description of the 
21264A boundary-scan register. 
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Appendix C, Serial Icache Load Predecode Values. provides a pointer to the Alpha 
Motherboards Software Developer’s Kit (SDK), which contains this information. 


Appendix D, PALcode Restrictions and Guidelines, lists restrictions and guidelines 
that must be adhered to when generating PALcode. 


Appendix E, 21264A-to-Bcache Pin Interconnections, lists changes and revisions to 
this manual. 


The Glossary lists and defines terms associated with the 21264A. 


An Index is provided at the end of the document. 


- Documentation Included by Reference 


The companion volume to this specification, the Alpha Architecture Handbook, Version 4, 
contains the instruction set architecture. You can access this document from the follow- 
ing website: ftp.digital.com/pub/Digital/info/semiconductor/ 
literature/dsc-library.html 


Also available is the Alpha Architecture Reference Manual, Third Edition, which con- 
tains the complete architecture information. That manual is available at bookstores 
from the Digital Press as EQ-W938E-DP. . 


Complete information on the 21264A phase-lock loop (PLL) is available in the 2/264A 
PLL Specification, located in the same repository as the 2/264A Specifications (this 
document). 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change 


Terminology and Conventions 


This section defines the abbreviations, terminology, and other conventions used 
throughout this document. 


Abbreviations 
e Binary Multiples 
The abbreviations K, M, and G (kilo, mega, and giga) represent binary multiples 


and have the following values. 


= 2!0(1024) 
279 (1,048,576) 
= 2°9(1,073,741,824) 


OKs 
i 


For example: 


2KB = 2kilobytes = 2x2! bytes 
4MB = 4megabytes = 4270 bytes 
8GB = S8gigabytes = 8x27? bytes 
2K pixels = 2kilopixels = 2x2!° pixels 
4M pixels = 4megapixels = 4x 270 pixels 


¢ Register Access 


The abbreviations used to indicate the type of access to register fields and bits have 
the following definitions: 


Abbreviation Meaning 


IGN Ignore 
Bits and fields specified are ignored on writes. 
MBZ Must Be Zero 


Software must never place a nonzero value in bits and fields specified as 
MBZ. A nonzero read produces an Illegal Operand exception. Also, MBZ 
fields are reserved for future use. 


RAZ Read As Zero 
Bits and fields return a zero when read. 
RC Read Clears 


Bits and fields are cleared when read. Unless otherwise specified, such bits 
cannot be written. 


RES Reserved 
Bits and fields are reserved by Compaq and should not be used; however, 
zeros can be written to reserved fields that cannot be masked. 


RO Read Only 
The value may be read by software. It is written by hardware. Software write 
operations are ignored. 


RO,n Read Only, and takes the value n at power-on reset. 
The value may be read by software. It is written by hardware. Software write 
operations are ignored. 
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Abbreviation Meaning 


RW Read/Write 
Bits and fields can be read and written. 





RW,n Read/Write, and takes the value n at power-on reset. 
Bits and fields can be read and written. 
WIC Write One to Clear 


If read operations are allowed to the register, then the value may be read by 
software. If it is a write-only register, then a read operation by software 
returns an UNPREDICTABLE result. Software write operations of a 1 cause 
the bit to be cleared by hardware. Software write operations of a 0 do not 
modify the state of the bit. 


WiS Write One to Set 
If read operations are allowed to the register, then the value may be read by 
software. If it is a write-only register, then a read operation by software 
returns an UNPREDICTABLE result. Software write operations of a 1 cause 
the bit to be set by hardware. Software write operations of a 0 do not modify 
the state of the bit. 


wo Write Only 
Bits and fields can be written but not read. 


WO,n Write Only, and takes the value n at power-on reset. 
Bits and fields can be written but not read. 
e Sign extension 


SEXT(x) means x is sign-extended to the required size. 


Addresses 


Unless otherwise noted, all addresses and offsets are hexadecimal. 


Aligned and Unaligned 


The terms aligned and naturally aligned are interchangeable and refer to data objects 
that are powers of two in size. An aligned datum of size 2n is stored in memory at a 
byte address that is a multiple of 2; that is, one that has n low-order zeros. For ex- 
ample, an aligned 64-byte stack frame has a memory address that is a multiple of 64. 


A datum of size 2n is unaligned if it is stored in a byte address that is not a multiple of 
2n. 


Bit Notation 


Multiple-bit fields can include contiguous and noncontiguous bits contained in square 
brackets ([]). Multiple contiguous bits are indicated by a pair of numbers separated by a 
colon [:]. For example, [9:7,5,2:0] specifies bits 9,8,7,5,2,1, and 0. Similarly, single bits 
are frequently indicated with square brackets. For example, [27] specifies bit 27. See 
also Field Notation. 


Caution 


Cautions indicate potential damage to equipment or loss of data. 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change 


Data Units 


The following data unit terminology is used throughout this manual. 





Term Words Bytes Bits Other 

Byte V2 1 8 — 

Word 1 2 16 | — 
Longword Z 4 32 Dword 
Quadword 4 8 64 2 longword 


Do Not Care (X) 
A capital X represents any valid value. 


External 


Unless otherwise stated, external means not contained in the chip. 


Field Notation 


The names of single-bit and multiple-bit fields can be used rather than the actual bit - 
numbers (see Bit Notation). When the field name is used, it is contained in square 
brackets ({]). For example, RegisterName[LowByte] specifies RegisterName[7:0]. 


Note 
Notes emphasize particularly important information. 
Numbering 


All numbers are decimal or hexadecimal unless otherwise indicated. The prefix Ox indi- 
cates a hexadecimal number. For example, 19 is decimal, but 0x19 and Ox19A are hexa- 
decimal (also see Addresses). Otherwise, the base is indicated by a subscript; for 
example, 1002 is a binary number. 


Ranges and Extents 


Ranges are specified by a pair of numbers separated by two periods (..) and are inclu- 
sive. For example, a range of integers 0..4 includes the integers 0, 1, 2, 3, and 4. 


Extents are specified by a pair of numbers in square brackets ([]) separated by a colon 
(:) and are inclusive. Bit fields are often specified as extents. For example, bits [7:3] 
specifies bits 7, 6, 5, 4, and 3. 


Register Figures 
The gray areas in register figures indicate reserved or unused bits and fields. 


Bit ranges that are coupled with the field name specify the bits of the named field that 
are included in the register. The bit range may, but need not necessarily, correspond to 
the bit Extent in the register. See the explanation above Table 5—1 for more information. 


Signal Names 


The following examples describe signal-name conventions used in this document. 
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AlphaSignal[n:n] Boldface, mixed-case type denotes signal names that are 
assigned internal and external to the 21264A (that is, the 
signal traverses a chip interface pin). 


AlphaSignal_x[{n:n]_ When a signal has high and low assertion states, a lower- 
case italic x represents the assertion states. For example, 
SignalName_x[3:0] represents SignalName_H[3:0] and 
SignalName_L[3:0]. 


UNDEFINED 


Operations specified as UNDEFINED may vary from moment to moment, implementa- 
tion to implementation, and instruction to instruction within implementations. The 
operation may vary in effect from nothing to stopping system operation. 


UNDEFINED operations may halt the processor or cause it to lose information. How- 
ever, UNDEFINED operations must not cause the processor to hang, that is, reach an 
unhalted state from which there is no transition to a normal state in which the machine 
executes instructions. _ 


UNPREDICTABLE 


UNPREDICTABLE results or occurrences do not disrupt the basic operation of the pro- 
cessor; it continues to execute instructions in its normal manner. Further: 


¢ Results or occurrences specified as UNPREDICTABLE may vary from moment to 
moment, implementation to implementation, and instruction to instruction within 
implementations. Software can never depend on results specified as UNPREDICT- 
ABLE. 


¢ An UNPREDICTABLE result may acquire an arbitrary value subject to a few con- 
straints. Such a result may be an arbitrary function of the input operands or of any 
state information that is accessible to the process in its current access mode. 
UNPREDICTABLE results may be unchanged from their previous values. 


Operations that produce UNPREDICTABLE results may also produce exceptions. 


e An occurrence specified as UNPREDICTABLE may happen or not based on an 
arbitrary choice function. The choice function is subject to the same constraints as 
are UNPREDICTABLE results and, in particular, must not constitute a security 
hole. 


Specifically, UNPREDICTABLE results must not depend upon, or be a function of, 
the contents of memory locations or registers that are inaccessible to the current 
process in the current access mode. 


Also, operations that may produce UNPREDICTABLE results must not: 


~— Write or modify the contents of memory locations or registers to which the cur- 
rent process in the current access mode does not have access, or 


— Halt or hang the system or any of its components. 


For example, a security hole would exist if some UNPREDICTABLE result 
depended on the value of a register in another process, on the contents of processor 
temporary registers left behind by some previously running process, or on a 
sequence of actions of different processes. 
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Do not care. A capital X represents any valid value. 
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Introduction 





This chapter provides a brief introduction to the Alpha architecture, Compaq’s RISC 
(reduced instruction set computing) architecture designed for high performance. The 
chapter then summarizes the specific features of the Alpha 21264A microprocessor 
(hereafter called the 21264A) that implements the Alpha architecture. Appendix A pro- 
vides a list of Alpha instructions. 


The companion volume to this specification, the Alpha Architecture Handbook, Verston 4, 
contains the instruction set architecture. Also available is the Alpha Architecture Refer- 
ence Manual, Third Edition, which contains the complete architecture information. 


1.1 The Architecture 


The Alpha architecture is a 64-bit load and store RISC architecture designed with par- 
ticular emphasis on speed, multiple instruction issue, multiple processors, and software 
migration from many operating systems. 


All registers are 64 bits long and all operations are performed between 64-bit registers. 
All instructions are 32 bits long. Memory operations are either load or store operations. 
All data manipulation is done between registers. 


The Alpha architecture supports the following data types: 

e §68-, 16-, 32-, and 64-bit integers 

e = JEEE 32-bit and 64-bit floating-point formats 

e VAX architecture 32-bit and 64-bit floating-point formats 


In the Alpha architecture, instructions interact with each other only by one instruction 
writing to a register or memory location and another instruction reading from that regis- 
ter or memory location. This use of resources makes it easy to build implementations 
that issue multiple instructions every CPU cycle. 


The 21264A uses a set of subroutines, called privileged architecture library code (PAL- 
code), that is specific to a particular Alpha operating system implementation and hard- 
ware platform. These subroutines provide operating system primitives for context 
switching, interrupts, exceptions, and memory management. These subroutines can be 
invoked by hardware or CALL_PAL instructions. CALL_PAL instructions use the 
function field of the instruction to vector to a specified subroutine. PALcode is written 
in standard machine code with some implementation-specific extensions to provide 
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direct access to low-level hardware functions. PALcode supports optimizations for mul- 
tiple operating systems, flexible memory-management implementations, and multi- 
instruction atomic sequences. 


The Alpha architecture performs byte shifting and masking with normal 64-bit, regis- 
ter-to-register instructions. The 21264A performs single-byte and single-word load and 
store instructions. 


1.1.1 Addressing 


The basic addressable unit in the Alpha architecture is the 8-bit byte. The 21264A sup- 
ports a 48-bit or 43-bit virtual address (selectable under IPR control). 


Virtual addresses as seen by the program are translated into physical memory addresses 
by the memory-management mechanism. The 21264A supports a 44-bit physical 
address. 


1.1.2 Integer Data Types 


Alpha architecture supports the four integer data types listed in Table 1-1. 


Table 1-1 Integer Data Types 


Data Type Description 
Byte A byte is 8 contiguous bits that start at an addressable byte boundary. 
A byte is an 8-bit value. 
Word A word is 2 contiguous bytes that start at an arbitrary byte boundary. 
A word is a 16-bit value. 
Longword A longword is 4 contiguous bytes that start at an arbitrary byte boundary. A 
longword is a 32-bit value. 
Quadword A quadword is 8 contiguous bytes that start at an arbitrary byte boundary. 
Note: Alpha implementations may impose a significant performance penalty 


when accessing operands that are not naturally aligned. Refer to the Alpha 
Architecture Handbook, Version 4, for details. 


1.1.3 Floating-Point Data Types 


1-2 


The 21264A supports the following floating-point data types: 
¢ Longword integer format in floating-point unit 
© Quadword integer format in floating-point unit 
e IEEE floating-point formats 
— S_floating 
- T_floating 
e VAX floating-point formats 
— F_floating 
— G_floating | 
— D_floating (limited support) 
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1.2 21264A Microprocessor Features 


The 21264A microprocessor is a superscalar pipelined processor. It is packaged in a 
587-pin PGA carrier and has removable application-specific heat sinks. A number of 
configuration options allow its use in a range of system designs ranging from extremely 
simple uniprocessor systems with minimum component count to high-performance 
multiprocessor systems with very high cache and memory bandwidth. 


The 21264A can issue four Alpha instructions in a single cycle, thereby minimizing the 
average cycles per instruction (CPI). A number of low-latency and/or high-throughput 
features in the instruction issue unit and the onchip components of the memory sub- 
system further reduce the average CPI. 


The 21264A and associated PALcode implements IEEE single-precision and double- 
precision, VAX F_floating and G_floating data types, and supports longword 

(32-bit) and quadword (64-bit) integers. Byte (8-bit) and word (16-bit) support is pro- 
vided by byte-manipulation instructions. Limited hardware support is provided for the 
VAX D_floating data type. 


Other 21264A features include: 
¢ The ability to issue up to four instructions during each CPU clock cycle. 
¢ A peak instruction execution rate of four times the CPU clock frequency. 


¢ An onchip, demand-paged memory-management unit with translation buffer, which, 
when used with PALcode, can implement a variety of page table structures and trans- 
lation algorithms. The unit consists of a 128-entry, fully-associative data translation 
buffer (DTB) and a 128-entry, fully-associative instruction translation buffer (ITB), 
with each entry able to map a single 8KB page or a group of 8, 64, or 512 8KB 
pages. The allocation scheme for the ITB and DTB is round-robin. The size of each 
translation buffer entry’s group is specified by hint bits stored in the entry. The 
DTB and ITB implement 8-bit address space numbers (ASN), MAX_ASN=255. 


® Two onchip, high-throughput pipelined floating-point units, capable of executing 
both VAX and IEEE floating-point data types. 


¢ An onchip, 64KB virtually-addressed instruction cache with 8-bit ASNs 
(MAX_ASN=255). 


¢ An onchip, virtually-indexed, physically-tagged dual-read-ported, 64KB data 
cache. 


e Supports a 48-bit or 43-bit virtual address (program selectable). 

¢ Supports a 44-bit physical address. 

e An onchip I/O wnite buffer with four 64-byte entries for I/O wnite transactions. 
¢ An onchip, 8-entry victim data buffer. 

e An onchip, 32-entry load queue. 

e An onchip, 32-entry store queue. 


e An onchip, 8-entry miss address file for cache fill requests and I/O read 
transactions. 


¢ Anonchip, 8-entry probe queue, holding pending system port probe commands. 
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An onchip, duplicate tag array used to maintain level 2 cache coherency. 
A 64-bit data bus with onchip parity and error correction code (ECC) support. 


Support for an external second-level (Bcache) cache. The size and some timing 
parameters of the Bcache are programmable. 


An internal clock generator providing a high-speed clock used by the 21264A, and 
two clocks for use by the CPU module. 


Onchip performance counters to measure and analyze CPU and system perfor- 
mance. 


Chip and module level test support, including an instruction cache test interface to 
support chip and module level testing. 


A 2.0-V external interface. 


Refer to Chapter 9 for 21264A dc and ac electrical characteristics. Refer to the Alpha 
Architecture Handbook, Version 4, Appendix E, for waivers and any other implementa- 
tion-dependent information. 


Introduction 
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This chapter provides both an overview of the 21264A microarchitecture and a system 
designer’s view of the 21264A implementation of the Alpha architecture. The combina- 
tion of the 21264A microarchitecture and privileged architecture library code (PALcode) 
defines the chip’s implementation of the Alpha architecture. If a certain piece of hardware 
seems to be “architecturally incomplete,” the missing functionality is implemented in 
PALcode. Chapter 6 provides more information on PALcode. 


This chapter describes the major functional hardware units and is not intended to be a 
detailed hardware description of the chip. It is organized as follows: 


© 21264A microarchitecture 

e = Pipeline organization 

e Instruction issue and retire rules 

¢ Load instructions to R31/F31 (software-directed instruction prefetch) 
e Special cases of Alpha instruction execution | 
¢ Memory and I/O address space 

e Miss address file (MAF) and load-merging rules 

e Instruction ordering 

e Replay traps 

e /O write buffer and the WMB instruction 

e Performance measurement support 

¢ Floating-point control register 

e AMASK and IMPLVER instruction values 


e Design examples 


2.1 21264A Microarchitecture 


The 21264A microprocessor is a high-performance third-generation implementation of 
the Compaq Alpha architecture. The 21264A consists of the following sections, as 
shown in Figure 2-1: 


e Instruction fetch, issue, and retire unit (Ibox) 
e Integer execution unit (Ebox) 
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Floating-point execution unit (Fbox) 

Onchip caches (Icache and Dcache) 

Memory reference unit (Mbox) 

External cache and system interface unit (Cbox) 


Pipeline operation sequence 


2.1.1 Instruction Fetch, Issue, and Retire Unit 


The instruction fetch, issue, and retire unit (Ibox) consists of the following subsections: 


Virtual program counter logic 

Branch predictor 

Instruction-stream translation buffer (ITB) 
Instruction fetch logic 

Register rename maps 

Integer and floating-point issue queues 
Exception and interrupt logic 


Retire logic 


2.1.1.1 Virtual Program Counter Logic 


2-2 


The virtual program counter (VPC) logic maintains the virtual addresses for instruc- 
tions that are in flight. There can be up to 80 instructions, in 20 successive fetch slots, in 
flight between the register rename mappers and the end of the pipeline. The VPC logic 
contains a 20-entry table to store these fetched VPC addresses. 
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Figure 2-1 21264A Block Diagram 
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2.1.1.2 Branch Predictor 


FM-05642-Ai4 


The branch predictor is composed of three units: the local, global, and choice predic- 
tors. Figure 2~2 shows how the branch predictor generates the predicted branch 
address. 
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Figure 2-2 Branch Predictor 
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Local Predictor 


The local predictor uses a 2-level table that holds the history of individual branches. 
The 2-level table design approaches the prediction accuracy of a larger single-level 
table while requiring fewer total bits of storage. Figure 2~3 shows how the local pre- 
dictor generates a prediction. Bits [11:2] of the VPC of the current branch are used as 
the index to a 1K entry table in which each entry is a 10-bit value. This 10-bit value is 
used as the index to a 1K entry table of 3-bit saturating counters. The value of the satu- 
rating counter determines the predication, taken/not-taken, of the current branch. 


Figure 2-3 Local Predictor 
VPC[11:2] 
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Global Predictor 


The global predictor is indexed by a global history of all recent branches. The global 
predictor correlates the local history of the current branch with all recent branches. Fig- 
ure 2—4 shows how the global predictor generates a prediction. The global path history 
is comprised of the taken/not-taken state of the 12 most-recent branches. These 12 
states are used to form an index into a 4K entry table of 2-bit saturating counters. The 
value of the saturating counter determines the predication, taken/not-taken, of the cur- 
rent branch. 
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Figure 2-4 Global Predictor 
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Choice Predictor 


The choice predictor monitors the history of the local and global predictors and chooses 
the best of the two predictors for a particular branch. Figure 2-5 shows how the choice 
predictor generates its choice of the result of the local or global prediction. The 12-bit 
global path history (see Figure 2-4) is used to index a 4K entry table of 2-bit saturating 
counters. The value of the saturating counter determines the choice between the outputs 
of the local and global predictors. 


Figure 2-5 Choice Predictor 
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2.1.1.3 Instruction-Stream Translation Buffer 


The Ibox includes a 128-entry, fully-associative instruction-stream translation buffer 
(ITB) that is used to store recently used instruction-stream (Istream) address transla- 
tions and page protection information. Each of the entries in the ITB can map 1, 8, 64, 
or 512 contiguous 8KB pages. The allocation scheme is round-robin. 


The ITB supports an 8-bit ASN and contains an ASM bit. The Icache is virtually 
addressed and contains the access-check information, so the ITB is accessed only for 
Istream references that miss in the Icache. 


Istream transactions to I/O address space are UNDEFINED. 
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2.1.1.4 Instruction Fetch Logic 


The instruction prefetcher (predecode) reads an octaword, containing up to four natu- 
rally aligned instructions per cycle, from the Icache. Branch prediction and line predic- 
tion bits accompany the four instructions. The branch prediction scheme operates most 
efficiently when only one branch instruction is contained among the four fetched 
instructions. The line prediction scheme attempts to predict the Icache line that the 
branch predictor will generate, and is described in Section 2.2. 


An entry from the subroutine return prediction stack, together with set prediction bits 
for use by the Icache stream controller, are fetched along with the octaword. The Icache 
stream controller generates fetch requests for additional Icache lines and stores the 
Istream data in the Icache. There is no separate buffer to hold Istream requests. 


2.1.1.5 Register Rename Maps 


The instruction prefetcher forwards instructions to the integer and floating-point regis- 
ter rename maps. The rename maps perform the two functions listed here: 


e Eliminate register write-after-read (WAR) and write-after-write (WAW) data 
dependencies while preserving true read-after-write (RAW) data dependencies, in 
order to allow instructions to be dynamically rescheduled. 


e Provide a means of speculatively executing instructions before the control flow 
previous to those instructions is resolved. Both exceptions and branch 
mispredictions represent deviations from the control flow predicted by the 
instruction prefetcher. 


The map logic translates each instruction’s operand register specifiers from the virtual 
register numbers in the instruction to the physical register numbers that hold the corre- 
sponding architecturally-correct values. The map logic also renames each instruction’s 
destination register specifier from the virtual number in the instruction to a physical 
register number chosen from a list of free physical registers, and updates the register 
maps. 


The map logic can process four instructions per cycle. It does not return the physical 
register, which holds the old value of an instruction’s virtual destination register, to the 
free list until the instruction has been retired, indicating that the control flow up to that 
instruction has been resolved. 


If a branch mispredict or exception occurs, the map logic backs up the contents of the 
integer and floating-point register rename maps to the state associated with the instruc- 
tion that triggered the condition, and the prefetcher restarts at the appropriate VPC. At 
most, 20 valid fetch slots containing up to 80 instructions can be in flight between the 
register maps and the end of the machine’s pipeline, where the control flow is finally 
resolved. The map logic is capable of backing up the contents of the maps to the state 
associated with any of these 80 instructions in a single cycle. 


The register rename logic places instructions into an integer or floating-point issue 
queue, from which they are later issued to functional units for execution. 


2.1.1.6 Integer Issue Queue 


The 20-entry integer issue queue (IQ), associated with the integer execution units 
(Ebox), issues the following types of instructions at a maximum rate of four per cycle: 
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Integer operate 

Integer conditional branch 

Unconditional branch — both displacement and memory format 
Integer and floating-point load and store 


PAL-reserved instructions: HW_MTPR, HW_MFPR, HW_LD, HW_ST, 
HW_RET 


Integer-to-floating-point (ITOFx) and floating-point-to-integer (FTOLx) 


Each queue entry asserts four request signals—one for each of the Ebox subclusters. A 
queue entry asserts a request when it contains an instruction that can be executed by the 
subcluster, if the instruction’s operand register values are available within the subclus- 


ter. 


There are two arbiters—one for the upper subclusters and one for the lower subclusters. 
(Subclusters are described in Section 2.1.2.) Each arbiter picks two of the possible 20 
requesters for service each cycle. A given instruction only requests upper subclusters or 
lower subclusters, but because many instructions can only be executed in one type or 
another this is not too limiting. 


For example, load and store instructions can only go to lower subclusters and shift 
instructions can only go to upper subclusters. Other instructions, such as addition and 
logic operations, can execute in either upper or lower subclusters and are statically 
assigned before being placed in the IQ. 


The IQ arbiters choose between simultaneous requesters of a subcluster based on the 
age of the request—older requests are given priority over newer requests. If a given 
instruction requests both lower subclusters, and no older instruction requests a lower 
subcluster, then the arbiter assigns subcluster LO to the instruction. If a given instruction 
requests both upper subclusters, and no older instruction requests an upper subcluster, 
then the arbiter assigns subcluster U1 to the instruction. This asymmetry between the 
upper and lower subcluster arbiters is a circuit implementation optimization with negli- 
gible overall performance effect. 


2.1.1.7 Floating-Point Issue Queue 


The 15-entry floating-point issue queue (FQ) associated with the Fbox issues the fol- 
lowing instruction types: 


Floating-point operates 
Floating-point conditional branches 
Floating-point stores 


Floating-point register to integer register transfers (FTOLx) 


Each queue entry has three request lines—one for the add pipeline, one for the multiply 
pipeline, and one for the two store pipelines. There are three arbiters—one for each of 
the add, multiply, and store pipelines. The add and multiply arbiters pick one requester 
per cycle, while the store pipeline arbiter picks two requesters per cycle, one for each 
store pipeline. 
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The FQ arbiters pick between simultaneous requesters of a pipeline based on the age of 
the request—older requests are given priority over newer requests. Floating-point store 
instructions and FTOL: instructions in even-numbered queue entries arbitrate for one 
store port. Floating-point store instructions and FTOL: instructions in odd-numbered 
queue entries arbitrate for the second store port. 


Floating-point store instructions and FTOLx instructions are queued in both the integer 
and floating-point queues. They wait in the floating-point queue until their operand reg- 
ister values are available. They subsequently request service from the store arbiter. 
Upon being issued from the floating-point queue, they signal the corresponding entry in 
the integer queue to request service. Upon being issued from the integer queue, the 

' Operation is completed. 


2.1.1.8 Exception and Interrupt Logic 


There are two types of exceptions: faults and synchronous traps. Arithmetic exceptions 
are precise and are reported as synchronous traps. 


The four sources of interrupts are listed as follows: 
¢ Level-sensitive hardware interrupts sourced by the IRQ_H[5:0] pins 


¢ Edge-sensitive hardware interrupts generated by the serial line receive pin, 
performance counter overflows, and hardware corrected read errors 


¢ Software interrupts sourced by the software interrupt request (SIRR) register 
e Asynchronous system traps (ASTs) 


Interrupt sources can be individually masked. In addition, AST interrupts are qualified 
by the current processor mode. 


2.1.1.9 Retire Logic 


The Ibox fetches instructions in program order, executes them out of order, and then 
retires them in order. The Ibox retire logic maintains the architectural state of the 
machine by retiring an instruction only if all previous instructions have executed with- 
out generating exceptions or branch mispredictions. Retiring an instruction commits the 
machine to any changes the instruction may have made to the software-visible state. 
The three software-visible states are listed as follows: 


¢ Integer and floating-point registers 
e Memory 


e Internal processor registers (including control/status registers and translation 
buffers) 


The retire logic can sustain a maximum retire rate of eight instructions per cycle, and 
can retire up to as many as 11 instructions in a single cycle. 


2.1.2 Integer Execution Unit 


The integer execution unit (Ebox) is a 4-path integer execution unit that is implemented 
as two functional-unit “clusters” labeled 0 and 1. Each cluster contains a copy of an 80- 
entry, physical-register file and two “subclusters”, named upper (U) and lower (L). Fig- 
ure 2-6 shows the integer execution unit. In the figure, iop_wr is the cross-cluster bus 
for moving integer result values between clusters. 
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Figure 2-6 Integer Execution Unit—Clusters 0 and 1 
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Most instructions have 1-cycle latency for consumers that execute within the same clus- 
ter. Also, there is another 1-cycle delay associated with producing a value in one cluster 
and consuming the value in the other cluster. The instruction issue queue minimizes the 
performance effect of this cross-cluster delay. The Ebox contains the following 
resources: 


Four 64-bit adders that are used to calculate results for integer add instructions 
(located in UO, U1, LO, and L1) 


The adders in the lower subclusters that are used to generate the effective virtual 
address for load and store instructions (located in LO and L1) 


Four logic units 

Two barrel shifters and associated byte logic (located in U0 and U1) 
Two Sets of conditional branch logic (located in UO and U1) 

Two copies of an 80-entry register file 


One pipelined multiplier (located in U1) with 7-cycle latency for all integer multiply 
operations 


One fully-pipelined unit (located in U0), with 3-cycle latency, that executes the fol- 
lowing instructions: 


—- CTLZ, CTPOP, CTTZ 
— PERR, MINxxx, MAXxxx, UNPKxx, PKxx 
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The Ebox has 80 register-file entries that contain storage for the values of the 31 Alpha 
integer registers (the value of R31 is not stored), the values of 8 PALshadow registers, 
and 41 results written by instructions that have not yet been retired. 


Ignoring cross-cluster delay, the two copies of the Ebox register file contain identical 
values. Each copy of the Ebox register file contains four read ports and six write ports. 
The four read ports are used to source operands to each of the two subclusters within a 
cluster. The six wnite ports are used as follows: 


¢ Two write ports are used to write results generated within the cluster. 
¢ Two write ports are used to write results generated by the other cluster. 


e Two write ports are used to write results from load instructions. These two ports 
are also used for FTOL instructions. 


2.1.3 Floating-Point Execution Unit 


The floating-point execution unit (Fbox) has two paths. The Fbox executes both VAX 
and IEEE floating-point instructions. It support IEEE S_floating-point and T_floating- 
point data types and all rounding modes. It also supports VAX F_floating-point and 
G_floating-point data types, and provides limited support for D_floating-point format. 
The basic structure of the floating-point execution unit is shown in Figure 2—7. 


Figure 2-7 Floating-Point Execution Units 
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The Fbox contains the following resources: 

¢ 72-entry physical register file 

¢ Fully-pipelined multiplier with 4-cycle latency 

e Fully-pipelined adder with 4-cycle latency 

¢ Nonpipelined divide unit associated with the adder pipeline 

¢ Nonpipelined square root unit associated with the adder pipeline 


The 72 Fbox register file entries contain storage for the values of the 31 Alpha floating- 
point registers (F31 is not stored) and 41 values written by instructions that have not 
been retired. 
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The Fbox register file contains six reads ports and four write ports. Four read ports are 
used to source operands to the add and multiply pipelines, and two read ports are used 
to source data for store instructions. Two write ports are used to write results generated 
by the add and multiply pipelines, and two write ports are used to write results from 
floating-point load instructions. 


2.1.4 External Cache and System Interface Unit 


The interface for the system and external cache (Cbox) controls the Bcache and system 
ports. It contains the following structures: 


® Victim address file (VAF) 

® Victim data file (VDF) 

e YV/O write buffer (IOWB) 

© Probe queue (PQ) 

¢ Duplicate Dcache tag (DTAG) 
2.1.4.1 Victim Address File and Victim Data File 


The victim address file (VAF) and victim data file (VDF) together form an 8-entry vic- 
tim buffer used for holding: ; 


© Dcache blocks to be wnitten to the Bcache 

e Istream cache blocks from memory to be wnitten to the Bcache 

e Bcache blocks to be written to memory 

¢ Cache blocks sent to the system in response to probe commands 
2.1.4.2 /O Write Buffer . 


The I/O write buffer (IOWB) consists of four 64-byte entries and associated address 
and control logic used for buffering /O write data between the store queue and the sys- 
tem port. 


2.1.4.3 Probe Queue 


The probe queue (PQ) is an 8-entry queue that holds pending system port cache probe 
commands and addresses. 


2.1.4.4 Duplicate Dcache Tag Array 


The duplicate Dcache tag (DTAG) array holds a duplicate copy of the Dcache tags and 
is used by the Cbox when processing Dcache fills, Icache fills, and system port probes. 


2.1.5 Onchip Caches 


The 21264A contains two onchip primary-level caches. 
2.1.5.1 Instruction Cache 


The instruction cache (Icache) is a 64KB virtual-addressed, 2-way set-predict cache. 
Set prediction is used to approximate the performance of a 2-set cache without slowing 
the cache access time. Each Icache block contains: 


¢ 16 Alpha instructions (64 bytes) 
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¢ §=Virtual tag bits [47:15] 

¢ 8-bit address space number (ASN) field 

e 1-bit address space match (ASM) bit 

e¢ 1-bit PALcode bit to indicate physical addressing 
¢ §=Valid bit 

e Data and tag parity bits 


¢ Four access-check bits for the following modes: kernel, executive, supervisor, and 
user (KESU) 


e Additional predecoded information to assist with instruction processing and fetch 
control 


2.1.5.2 Data Cache 


The data cache (Dcache) is a 64KB, 2-way set-associative, virtually indexed, physically 
tagged, write-back, read/write allocate cache with 64-byte blocks. During each cycle 
the Dcache can perform one of the following transactions: 


¢ Two quadword (or shorter) read transactions to arbitrary addresses 

¢ Two quadword write transactions to the same aligned octaword 

¢ Two non-overlapping less-than-quadword writes to the same aligned quadword 

¢ One sequential read and write transaction from and to the same aligned octaword 
Each Deache block contains: 

¢ 64 data bytes and associated quadword ECC bits 

¢ Physical tag bits 

e¢ =Valid, dirty, shared, and modified bits 

e Tag parity bit calculated across the tag, dirty, shared, and modified bits 

¢ One bit to control round-robin set allocation (one bit per two cache blocks) 


The Dcache contains two sets, each with 512 rows containing 64-byte blocks per row 
(that is, 32K bytes of data per set). The 21264A requires two additional bits of virtual 
address beyond the bits that specify an 8KB page, in order to specify a Dcache row 
index. A given virtual address might be found in four unique locations in the Dcache, 
depending on the virtual-to-physical translation for those two bits. The 21264A pre- 
vents this aliasing by keeping only one of the four possible translated addresses in the 
cache at any time. 


2.1.6 Memory Reference Unit 


2-12 


The memory reference unit (Mbox) controls the Dcache and ensures architecturally 
correct behavior for load and store instructions. The Mbox contains the following struc- 
tures: 


¢ Load queue (LQ) 
e =Store queue (SQ) 
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e Miss address file (MAF) 
e Dstream translation buffer (DTB) 
2.1.6.1 Load Queue 


The load queue (LQ) is a reorder buffer for load instructions. It contains 32 entries and 
maintains the state associated with load instructions that have been issued to the Mbox, 
but for which results have not been delivered to the processor and the instructions 
retired. The Mbox assigns load instructions to LQ slots based on the order in which | 
they were fetched from the Icache, then places them into the LQ after they are issued by 
the IQ. The LQ helps ensure correct Alpha memory reference behavior. 


2.1.6.2 Store Queue 


The store queue (SQ) is a reorder buffer and graduation unit for store instructions. It 
contains 32 entries and maintains the state associated with store instructions that have 
been issued to the Mbox, but for which data has not been written to the Dcache and the 
instruction retired. The Mbox assigns store instructions to SQ slots based on the order 
in which they were fetched from the Icache and places them into the SQ after they are 
issued by the IQ. The SQ holds data associated with store instructions issued from the 
IQ until they are retired, at which point the store can be allowed to update the Dcache. 
The SQ also helps ensure correct Alpha memory reference behavior. 


2.1.6.3 Miss Address File 


The 8-entry miss address file (MAF) holds physical addresses associated with pending 
Icache and Deache fill requests and pending I/O space read transactions. 


2.1.6.4 Dstream Translation Buffer 


The Mbox includes a 128-entry, fully associative Dstream translation buffer (DTB) used 
to store Dstream address translations and page protection information. Each of the entries 
in the DTB can map 1, 8, 64, or 512 contiguous 8KB pages. The allocation scheme is 
round-robin. The DTB supports an 8-bit ASN and contains an ASM bit. 


2.1.7 SROM Interface 


The serial read-only memory (SROM) interface provides the initialization data load 
path from a system SROM to the Icache. Refer to Chapter 7 for more information. 


2.2 Pipeline Organization 


The 7-stage pipeline provides an optimized environment for executing Alpha instruc- 
tions. The pipeline stages (0 to 6) are shown in Figure 2—8 and described in the follow- 
Ing paragraphs. 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change Internal Architecture 2-13 


Pipeline Organization 


Figure 2-8 Pipeline Organization 
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Stage 0 — Instruction Fetch 


The branch predictor uses a branch history algorithm to predict a branch instruction tar- 
get address. . 


Up to four aligned instructions are fetched from the Icache, in program order. The 
branch prediction tables are also accessed in this cycle. The branch predictor uses tables 
and a branch history algorithm to predict a branch instruction target address for one 
branch or memory format JSR instruction per cycle. Therefore, the prefetcher is limited 
to fetching through one branch per cycle. If there is more than one branch within the 
fetch line, and the branch predictor predicts that the first branch will not be taken, it will 
predict through subsequent branches at the rate of one per cycle, until it predicts a taken 
branch or predicts through the last branch in the fetch line. 


The Icache array also contains a line prediction field, the contents of which are applied 
to the Icache in the next cycle. The purpose of the line predictor is to remove the pipe- 
line bubble which would otherwise be created when the branch predictor predicts a 
branch to be taken. In effect, the line predictor attempts to predict the Icache line which 
the branch predictor will generate. On fills, the line predictor value at each fetch line is 
initialized with the index of the next sequential fetch line, and later retrained by the 
branch predictor if necessary. 


Stage1 — Instruction Slot 


The Ibox maps four instructions per cycle from the 64KB 2-way set-predict Icache. 
Instructions are mapped in order, executed dynamically, but are retired in order. 
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In the slot stage, the branch predictor compares the next Icache index that it generates to 
the index that was generated by the line predictor. If there is a mismatch, the branch 
predictor wins—the instructions fetched during that cycle are aborted, and the index 
predicted by the branch predictor is applied to the Icache during the next cycle. Line 
mispredictions result in one pipeline bubble. 


The line predictor takes precedence over the branch predictor during memory format 
calls or jumps. If the line predictor was trained with a true (as opposed to predicted) 
memory format call or jump target, then its contents take precedence over the target 
hint field associated with these instructions. This allows dynamic calls or jumps to be 
correctly predicted. 


The instruction fetcher produces the full VPC address during the fetch stage of the pipe- 
line. The Icache produces the tags for both Icache sets 0 and 1 each time it is accessed. 
That enables the fetcher to separate set mispredictions from true Icache misses. If the 
access was caused by a set misprediction, the instruction fetcher aborts the last two 
fetched slots and refetches the slot in the next cycle. It also retrains the appropriate set 
prediction bits. 


The instruction data is transferred from the Icache to the integer and floating-point reg- 
ister map hardware during this stage. When the integer instruction is fetched from the 
Icache and slotted into the IQ, the slot logic determines whether the instruction is for 
the upper or lower subclusters. The slot logic makes the decision based on the 
resources needed by the (up to four) integer instructions in the fetch block. Although all 
four instructions need not be issued simultaneously, distributing their resource usage 
improves instruction loading across the units. For example, if a fetch block contains 
two instructions that can be placed in either cluster followed by two instructions that 
must execute in the lower cluster, the slot logic would designate that combination as 
EELL and slot them as UULL. Slot combinations are described in Section 2.3.2 and 
Table 2-3. 


Stage 2 — Map 


Instructions are sent from the Icache to the integer and floating-point register maps dur- 
ing the slot stage and register renaming is performed during the map stage. Also, each 
instruction is assigned a unique 8-bit number, called an inum, which is used to identify 
the instruction and its program order with respect to other instructions during the time 
that it is in flight. Instructions are considered to be in flight between the time they are 
mapped and the time they are retired. 


Mapped instructions and their associated inums are placed in the integer and floating- 
point queues by the end of the map stage. 


Stage 3 — Issue 


The 20-entry integer issue queue (IQ) issues instruction at the rate of four per cycle. 
The 15-entry floating-point issue queue (FQ) issues floating-point operate instructions, 
conditional branch instructions, and store instructions, at the rate of two per cycle. Nor- 
mally, instructions are deleted from the IQ or FQ two cycles after they are issued. For 
example, if an instruction is issued in cycle n, it remains in the FQ or IQ incycle n+1 
but does not request service, and is deleted in cycle n+2. 
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Stage 4 — Register Read 


Instructions issued from the issue queues read their operands from the integer and float- 
ing-point register files and receive bypass data. 


Stage 5 — Execute 
The Ebox and Fbox pipelines begin execution. 
Stage 6 — Dcache Access 


Memory reference instructions access the Dcache and data translation buffers. Nor- 
mally load instructions access the tag and data arrays while store instructions only 
access the tag arrays. Store data is written to the store queue where it is held until the 
store instruction is retired. Most integer operate instructions write their register results 
in this cycle. 


2.2.1 Pipeline Aborts 


The abort penalty as given is measured from the cycle after the fetch stage of the 
instruction which triggers the abort to the fetch stage of the new target, ignoring any 
Ibox pipeline stalls or queuing delay that the triggering instruction might experience. 
Table 2—1 lists the timing associated with each common source of pipeline abort. 


Table 2-1 Pipeline Abort Delay (GCLK Cycles) 


Penalty 
Abort Condition (cycles) | Comments 
Branch misprediction 7 Integer or floating-point conditional branch 
misprediction. 
JSR misprediction 8 Memory format JSR or HW_RET. 
Mbox order trap 14 Load-load order or store-load order. 


Other Mbox replay traps 13 — 


DTB miss 13 — 
ITB miss 7 “ee 
Integer arithmetic trap 12 — 


Floating-point arithmetic 13+Latency Add latency of instruction. See Section 2.3.3 for 
trap instruction latencies. 


2.3 Instruction Issue Rules 


This section defines instruction classes, the functional unit pipelines to which they are 
issued, and their associated latencies. 
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Instruction Issue Rules 


Table 2-2 lists the instruction class, the pipeline assignments, and the instructions 
included in the class. 


Table 2-2 Instruction Name, Pipeline, and Types 





Class 

Name Pipeline instruction Type 

ild LO, LI All integer load instructions 

fld LO, L1 All floating-point load instructions 

ist LO, L1 All integer store instructions 

fst FSTO, FST1,L0,L1 All floating-point store instructions 

Ida LO, L1, UO, Ul LDA, LDAH 

mem_misc LI WH64, ECB, WMB 

rpcc Ll RPCC 

TX Ll RS, RC 

mxpr LO, Ll HW_MTPR, HW_MFPR 

(depends on IPR) 

ibr U0, U1 Integer conditional branch instructions 

jsr LO BR, BSR, JMP, CALL, RET, COR, HW_RET, 
CALL_PAL 

iadd LO, U0, L1, Ul Instructions with opcode 106, except CMPBGE 

ilog LO, U0, LI, U1 AND, BIC, BIS, ORNOT, XOR, EQV, CMPBGE 

ishf U0, Ul Instructions with opcode 1246 

cmov LO, UO, LI, U1 Integer CMOV — either cluster 

imul Ul Integer multiply instructions 

imisc UO CTLZ, CTPOP, CTTZ, PERR, MINxxx, MAXxxx, 
PKxx, UNPKxx 

fbr FA Floating-point conditional branch instructions 

fadd FA All floating-point operate instructions except multiply, 
divide, square root, and conditional move instructions 

fmul FM Floating-point multiply instruction 

femov] FA Floating-point CMOV—first half 

femov2 FA Floating-point CMOV— second half 

fdiv FA Floating-point divide instruction 

fsqrt FA Floating-point square root instruction 

nop None TRAP, EXCB, UNOP - LDQ_U R31, O(Rx) 
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Table 2-2 Instruction Name, Pipeline, and Types (Continued) 








Class 

Name Pipeline Instruction Type 

ftoi FSTO, FST1, LO, L1 FTOIS, FTOIT 

itof LO, L1 ITOFS, ITOFF, ITOFT 

mx_fper FM Instructions that move data from the floating-point 


control register 


2.3.2 Ebox Slotting 


Instructions that are issued from the IQ, and could execute in either upper or lower 
Ebox subclusters, are slotted to one pair or the other during the pipeline mapping stage 
based on the instruction mixture in the fetch line. The codes that are used in Table 2-3 
are as follows: 


¢ U—tThe instruction only executes in an upper subcluster. 
¢ L—tThe instruction only executes in a lower subcluster. 
¢ E—tThe instruction could execute in either an upper or lower subcluster. 


Table 2-3 defines the slotting rules. The table field Instruction Class 3, 2, 1 and 0 iden- 
tifies each instruction’s location in the fetch line by the value of bits [3:2] 
in its PC. 


Table 2-3 Instruction Group Definitions and Pipeline Unit 


Instruction Class Slotting Instruction Class Slotting 
3210 3210 3210 3210 

EEEE ULUL LLLL LLLE 
EEEL ULUL LLLU LLLU 
EEEU ULLU LLVE LLUU 
EELE ULLU LLUL LGUL 
EELL UULL LLUU LLUU 
EELU ULLU LUEE LULU 
EEUE ULUL LUEL LUUL 
EEUL ULUL LUEU LULU 
EEUU LLUU LULE LULU 
ELEE ULUL LULL LULL 
ELEL ULUL LULU LULU 
ELEU ULLU LUUE LUUL 
ELLE ULLU LUUL LUUL 
ELLL ULLL LUUU LUUU 
ELLU ULLU UEEE ULUL 
ELUE ULUL UEEL ULUL 
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Table 2-3 Instruction Group Definitions and Pipeline Unit (Continued) 


Instruction Class Slotting Instruction Class Slotting 
3210 3210 3210 3210 
ELUL ULUL UEEU ULLU 
ELUU LLUU UELE ULLU 
EUEE LULU UELL UULL 
EUEL LUUL UELU ULLU 
EUEU LULU UEUE ULUL 
EULE LULU UEUL ULUL 
EULL UULL UEUU ULUU 
EULU LULU ULEE ULUL 
EUUE LUUL ULEL ULUL 
EUUL EVOL ULEU ULLU 
EUUU LUUU ULLE ULLU 
LEEE LULU ULLL ULLL 
LEEL LUUL ULLU ULLU 
LEEU LULU ULUE ULUL 
LELE LULU ULUL ULUL 
LELL LULL ULUU ULUU 
LELU LULU UUEE UULL 
LEUE LUUL UUEL UWL 
LEUL LUUL UUEU UULU 
LEUU LLUU UULE UULL 
LLEE LEUU UULL UULL 
L-LEL LLUL UULU UULU 
LLEU LLUU UUUE UUUL 
LLLE LLLU UUUL UUUL 
—_ — UUUU UUUU 
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2.3.3 Instruction Latencies 


After an instruction is placed in the IQ or FQ, its issue point is determined by the avail- 
ability of its register operands, functional unit(s), and relationship to other instructions 
in the queue. There are register producer-consumer dependencies and dynamic func- 
tional unit availability dependencies that affect instruction issue. The mapper removes 
register producer-producer dependencies. 


The latency to produce a register result is generally fixed. The one exception is for load 
instructions that miss the Dcache. Table 2-4 lists the latency, in cycles, for each 
instruction class. 


Table 2-4 Instruction Class Latency in Cycles 








Class Latency Comments 
ild 3 Deache hit. 
13+ Deache miss, latency with 6-cycle Bcache. Add additional Bcache loop latency if 
Bcache latency is greater than 6 cycles. 
fld 4 Deache hit. 
14+ Deache miss, latency with 6-cycle Bcache. Add additional Bcache loop latency if 
Bcache latency is greater than 6 cycles. 
ist — Does not produce register value. 
fst — Does not produce register value. 
rpcc ] Possible I-cycle cross-cluster delay. 
IX 1 — 
mxpr 1 or 3 HW_MFPR: Ebox IPRs =}. 
. Ibox and Mbox IPRs = 3. 
HW_MTPR does not produce a register value. 
icbr —_ Conditional branch. Does not produce register value. 
ubr 3 Unconditional branch. Does not produce register value. 
jsr 3 — 
iadd ] Possible 1-cycle Ebox cross-cluster delay. 
ilog ] Possible 1-cycle Ebox cross-cluster delay. 
ishf 1 Possible 1-cycle Ebox cross-cluster delay. 
cmovl 1! Only consumer is cmov2. Possible 1-cycle Ebox cross-cluster delay. 
cmov2 1 Possible 1-cycle Ebox cross-cluster delay. 
imul Z Possible 1-cycle Ebox cross-cluster delay. 
imisc 3 Possible 1-cycle Ebox cross-cluster delay. 
fcbr —_ Does not produce register value. 
fadd 4 Consumer other than fst or fto1. 
6 Consumer fst or ftoi. 
Measured from when an fadd is issued from the FQ to when an fst or ftoi is issued 
from the IQ. 
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Table 2-4 Instruction Class Latency in Cycles (Continued) 





Class Latency 


Comments 





fmul 4 
6 
femovl 4 
femov2 4 
6 
fdiv 12 
9 
15 
12 
fsqrt 18 
15 
33 
30 
ftoi 3 
itof 4 
nop — 


Consumer other than fst or ftoi. 

Consumer fst or ftoi. 

Measured from when an fmul is issued from the FQ to when an fst or ftoi is issued 
from the IQ. 


Only consumer is femov?2. 


Consumer other than fst. 

Consumer fst or ftoi. 

Measured from when an fcmovz?2 is issued from the FQ to when an fst or ftoi is issued 
from the IQ. 


Single precision - latency to consumer of result value. 
Single precision - latency to using divider again. 
Double precision - latency to consumer of result value. 
Double precision - latency to using divider again. 


Single precision - latency to consumer of result value. 
Single precision - latency to using unit again. 

Double precision - latency to consumer of result value. 
Double precision - latency to using unit again. 


Does not produce register value. 


2.4 Instruction Retire Rules 


An instruction is retired when it has been executed to completion, and all previous 
instructions have been retired. The execution pipeline stage in which an instruction 
becomes eligible to be retired depends upon the instruction’s class. 


Table 2—5 gives the minimum retire latencies (assuming that all previous instructions 
have been retired) for various classes of instructions. 


Table 2-5 Minimum Retire Latencies for Instruction Classes 


Instruction Class 


Retire Stage Comments 


Integer conditional branch 7 — 


Integer multiply 
Integer operate 
Memory 
Floating-point add 


Floating-point multiply 


113 Latency is 13 cycles for the MUL/V instruction. 
q fae 
10 — 
1] — 
1] — 
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Table 2-5 Minimum Retire Latencies for Instruction Classes (Continued) 





Instruction Class Retire Stage Comments 


Floating-point DIV/SQRT 11 + latency Add latency of unit reuse for the instruction indicated in Table 


2-4. For example, latency for a single-precision fdiv would be 
11 plus 9 from Table 2-4. Latency is 11 if hardware detects that 
no exception is possible (see Section 2.4.1). 


Floating-point conditional 11 Branch instruction mispredict is reported in stage 7. 
branch 
BSR/JSR 10 JSR instruction mispredict is reposted in stage 8. 


2.4.1 Floating-Point Divide/Square Root Early Retire 


The floating-point divider and square root unit can detect that, for many combinations 

of source operand values, no exception can be generated. Instructions with these oper- 
ands can be retired before the result is generated. When detected, they are retired with 

the same latency as the FP add class. Early retirement is not possible for the following 

instruction/operand/architecture state conditions: 


e Instruction is not a DIV or SQRT. 
e SQRT source operand is negative. 
© Divide operand exponent_a is 0. 
e Ejther operand is NaN or INF. 

e Divide operand exponent_b is 0. 
e Trapping mode is /I (inexact). 

e INE status bit is 0. 


Early retirement is also not possible for divide instructions if the resulting exponent has 
any of the following characteristics (EXP is the result exponent): 


° DIVT, DIVG: (EXP >= 3FF\¢) OR (EXP <= 246) 
¢ DIVS, DIVF: (EXP >= 7F1¢) OR (EXP <= 382;¢) 


2.5 Retire of Operate Instructions into R31/F31 


Many instructions that have R31 or F31 as their destination are retired immediately 
upon decode (stage 3). These instructions do not produce a result and are removed from 
the pipeline as well. They do not occupy a slot in the issue queues and do not occupy a 
functional unit. Table 2-6 lists these instructions and some of their characteristics. The 
instruction type in Table 2—6 is from Table C-6 in Appendix C of the Alpha Architecture 
Handbook, Version 4. 
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Table 2-6 Instructions Retired Without Execution 


Instruction Type Notes 

INTA, INTL, INTM, INTS All with R31 as destination. 

FLTI, FLTL, FLTV All with F31 as destination. MT_FPCR is not included 
because it has no destination—it is never removed from the 
pipeline. 

LDQ_U All with R31 as destination. 

MISC TRAPB and EXCB are always removed. Others are never 
removed. 

FLTS All (SQRT, ITOF) with F31 as destination. 


2.6 Load Instructions to R31 and F31 


This section describes how the 21264A processes software-directed prefetch transac- 
tions and load instructions with a destination of R31 and F31. . 


Load operations to R31 and F31 may generate exceptions. These exceptions must be 
dismissed by PALcode. 


2.6.1 Normal Prefetch: LDBU, LDF, LDG LDL, LDT, LDWU Instructions 


The 21264A processes these instructions as normal cache line prefetches. If the load 
instruction hits the Dcache, the instruction is dismissed, otherwise the addressed cache 
block is allocated into the Dcache. 


2.6.2 Prefetch with Modify Intent: LDS Instruction 


The 21264A processes an LDS instruction, with F31 as the destination, as a prefetch 
with modify intent transaction (ReadBIkModSpec command). If the transaction hits a 
dirty Dcache block, the instruction is dismissed. Otherwise, the addressed cache block 
is allocated into the Dcache for write access, with its dirty and modified bits set. 


2.6.3 Prefetch, Evict Next: LDQ Instruction 


The 21264A processes this instruction like a normal prefetch transaction (ReadBlkSpec 
command), with one exception—if the load misses the Dcache, the addressed cache 
block is allocated into the Dcache, but the Dcache set allocation pointer is left pointing 
to this block. The next miss to the same Dcache line will evict the block. For example, 
this instruction might be used when software is reading an array that is known to fit in 
the offchip Bcache, but will not fit into the onchip Dcache. In this case, the instruction 
ensures that the hardware provides the desired prefetch function without displacing use- 
ful cache blocks stored in the other set within the Dcache. 


2.7 Special Cases of Alpha Instruction Execution 


This section describes the mechanisms that the 21264A uses to process irregular 
instructions in the Alpha instruction set, and cases in which the 21264A processes 
instructions in a non-intuitive way. : 
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2.7.1 Load Hit Speculation 


The latency of integer load instructions that hit in the Dcache is three cycles. Figure 2— 
9 shows the pipeline timing for these integer load instructions. In Figure 2-9: 


Symbol Meaning 

Q Issue queue 

R Register file read 
E Execute 

D Dcache access 

B Data bus active 


Figure 2-9 Pipeline Timing for Integer Load Instructions 


Hit 

Cycle Number 1 2 3 4 fs 6 7 8 
ILD Q R E D B 
Instruction 1 Q R 
Instruction 2 Q 
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There are two cycles in which the IQ may speculatively issue instructions that use load 
data before Dcache hit information is known. Any instructions that are issued by the IQ 
within this 2-cycle speculative window are kept in the IQ with their requests inhibited 
until the load instruction’s hit condition is known, even if they are not dependent on the 
load operation. If the load instruction hits, then these instructions are removed from the 
queue. If the load instruction misses, then the execution of these instructions is aborted 
and the instructions are allowed to request service again. 


For example, in Figure 2-9, instruction 1 and instruction 2 are issued within the specu- 
lative window of the load instruction. If the load instruction hits, then both instructions 
will be deleted from the queue by the start of cycle 7—one cycle later than normal for 
instruction | and at the normal time for instruction 2. If the load instruction misses, both 
instructions are aborted from the execution pipelines and may request service again in 
cycle 6. 


IQ-issued instructions are aborted if issued within the speculative window of an integer 
load instruction that missed in the Dcache, even if they are not dependent on the load 
data. However, if software misses are likely, the 21264A can still benefit from schedul- 
ing the instruction stream for Dcache miss latency. The 21264A includes a saturating 
counter that is incremented when load instructions hit and is decremented when load 
instructions miss. When the upper bit of the counter equals zero, the integer load 
latency is increased to five cycles and the speculative window is removed. The counter 
is 4 bits wide and is incremented by 1 on a hit and is decremented by two on a miss. 


Since load instructions to R31 do not produce a result, they do not create a speculative 
window when they execute and, therefore, never waste IQ-issue cycles if they miss. 


Compaq Confidential 
2-24 Internal Architecture 21264A Revision 1.1 —- Subject To Change 


Special Cases of Alpha Instruction Execution 


Floating-point load instructions that hit in the Deache have a latency of four cycles. Fig- 
ure 2-10 shows the pipeline timing for floating-point load instructions. In Figure 2-10: 


Symbol Meaning 

Q Issue queue 

R Register file read 
E Execute 

D Dcache access 

B Data bus active 


Figure 2-10 Pipeline Timing for Floating-Point Load Instructions 
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Cycle Number 1 2 3 4 e 6 7 8 
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Instruction 1 Q R 
Instruction 2 Q 
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The speculative window for floating-point load instructions is one cycle wide. 
FQ-issued instructions that are issued within the speculative window of a floating-point 
load instruction that has missed, are only aborted if they depend on the load being suc- 
cessful. 


For example, in Figure 2-10 instruction 1 is issued in the speculative. window of the 
load instruction. 


If instruction 1 is not a user of the data returned by the load instruction, then it is 
removed from the queue at its normal time (at the start of cycle 7). 


If instruction 1 is dependent on the load instruction data and the load instruction hits, 
instruction | is removed from the queue one cycle later (at the start of cycle 8). If the 
load instruction misses, then instruction 1 is aborted from the Fbox pipeline and may 
request service again in cycle 7. 


2.7.2 Floating-Point Store Instructions 


Floating-point store instructions are duplicated and loaded into both the IQ and the FQ 
from the mapper. Each IQ entry contains a control bit, fpWait, that when set prevents 
that entry from asserting its requests. This bit is initially set for each floating-point store 
instruction that enters the IQ, unless it was the target of a replay trap. The instruction’s 
FQ clone is issued when its Ra register is about to become clean, resulting in its IQ 
clone’s fpWait bit being cleared and allowing the IQ clone to issue and be executed by 
the Mbox. This mechanism ensures that floating-point store instructions are always 
issued to the Mbox, along with the associated data, without requiring the floating-point 
register dirty bits to be available within the IQ. 
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2.7.3 CMOV Instruction 


For the 21264A, the Alpha CMOV instruction has three operands, and so presents a 
special case. The required operation is to move either the value in register Rb or the 
value from the old physical destination register into the new destination register, based 
upon the value in Ra. Since neither the mapper nor the Ebox and Fbox data paths are 
otherwise required to handle three operand instructions, the CMOV instruction is 
decomposed by the Ibox pipeline into two 2-operand instructions: 


The Alpha architecture instruction CMOV Ra, Rb=> Rc 
Becomes the 21264A instructions CMOV! Ra, oldRc => newRcl 
CMOV?2 newRcl, Rb => newRc2 


The first instruction, CMOV 1, tests the value of Ra and records the result of this test in 
a 65th bit of its destination register, newRc1. It also copies the value of the old physical 
destination register, oldRc, to newRc]. 


The second instruction, CMOV2, then copies either the value in newRcl or the value in 
Rb into a second physical destination register, newRc2, based on the CMOV predicate 
bit stored in newRcl. 


In summary, the original CMOV instruction is decomposed into two dependent instruc- 
tions that each use a physical register from the free list. 


To further simplify this operation, the two component instructions of a CMOV instruc- 
tion are driven through the mappers in successive cycles. Hence, if a fetch line contains 
n CMOV instructions, it takes +1 cycles to run that fetch line through the mappers. 


For example, the following fetch line: 
ADD CMOVx SUB CMOVy 

Results in the following three map cycles: 
ADD CMOV«x!1 
CMOVx2 SUB CMOVy1 
CMOVy2 


The Ebox executes integer CMOV instructions as two distinct 1-cycle latency opera- 
tions. The Fbox add pipeline executes floating-point CMOV instructions as two distinct 
4-cycle latency operations. 


2.8 Memory and I/O Address Space Instructions 


This section provides an overview of the way the 21264A processes memory and /O 
address space instructions. 


The 21264A supports, and internally recognizes, a 44-bit physical address space that is 
divided equally between memory address space and I/O address space. Memory 
address space resides in the lower half of the physical address space (PA[43]=0) 

and I/O address space resides in the upper half of the physical address space 
(PA[43]=1). 
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The IQ can issue any combination of load and store instructions to the Mbox at the rate 
of two per cycle. The two lower Ebox subclusters, LO and L1, generate the 
48-bit effective virtual address for these instructions. 


An instruction is defined to be newer than another instruction if it follows that instruc- 
tion in program order and is older if it precedes that instruction in program order. 


2.8.1 Memory Address Space Load Instructions 


The Mbox begins execution of a load instruction by translating its virtual address to a 
physical address using the DTB and by accessing the Dcache. The Dcache is virtually 
indexed, allowing these two operations to be done in parallel. The Mbox puts informa- | 
tion about the load instruction, including its physical address, destination register, and 
data format, into the LQ. 


If the requested physical location is found in the Dcache (a hit), the data is formatted 
and written into the appropriate integer or floating-point register. If the location is not in 
the Dcache (a miss), the physical address is placed in the miss address file (MAF) for 
processing by the Cbox. The MAF performs a merging function in which a new miss 
address is compared to miss addresses already held in the MAF If the new miss address 
points to the same Dcache block as a miss address in the MAF, then the new miss 
address is discarded. 


When Dcache fill data is returned to the Dcache by the Cbox, the Mbox satisfies the 
requesting load instructions in the LQ. 


2.8.2 I/O Address Space Load Instructions 


Because I/O space load instructions may have side effects, they cannot be performed 
speculatively. When the Mbox receives an I/O space load instruction, the Mbox places 
the load instruction in the LQ, where it is held until it retires. The Mbox replays retired 
I/O space load instructions from the LQ to the MAF in program order, at a rate of one 
per GCLK cycle. 


The Mbox allocates a new MAF entry to an I/O load instruction and increases I/O band- 
width by attempting to merge I/O load instructions in a merge register. Table 2-7 shows 
the rules for merging data. The columns represent the load instructions replayed to the 
MAF while the rows represent the size of the load in the merge register. 


Table 2~7 Rules for /O Address Space Load Instruction Data Merging 





Merge Register/ 

Replayed Instruction Load Byte/Word Load Longword Load Quadword 
Byte/Word No merge No merge No merge 

Longword No merge Merge up to 32 bytes No merge 

Quadword No merge No merge Merge up to 64 bytes 


In summary, Table 2—7 shows some of the following rules: 


¢ Byte/word load instructions and different size load instructions are not allowed to 
merge. 


e A stream of ascending non-overlapping, but not necessarily consecutive, longword 
load instructions are allowed to merge into naturally aligned 32-byte blocks. 
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e A stream of ascending non-overlapping, but not necessarily consecutive, quadword 
load instructions are allowed to merge into naturally aligned 64-byte blocks. 


¢ Merging of quadwords can be limited to naturally-aligned 32-byte blocks based on 
the Cbox WRITE_ONCE chain 32_BYTE_IO field. 


¢ To minimize latency the I/O register merge window is closed when a timer detects 
no I/O load instruction activity for 14 cycles, or zero cycles if the last QW/LW of 
the block is addressed. 


After the Mbox I/O register has closed its merge window, the Cbox sends I/O read: 
requests offchip in the order that they were received from the Mbox. 


2.8.3 Memory Address Space Store Instructions 


The Mbox begins execution of a store instruction by translating its virtual address to a 
physical address using the DTB and by probing the Dcache. The Mbox puts informa- 
tion about the store instruction, including its physical address, its data and the results of 
the Dcache probe, into the store queue (SQ). 


If the Mbox does not find the addressed location in the Dcache, it places the address 
into the MAF for processing by the Cbox. If the Mbox finds the addressed location in a 
Dcache block that is not dirty, then it places a ChangeToDirty request into the MAF. 


A store instruction can write its data into the Dceache when it is retired, and when the 
Deache block containing its address is dirty and not shared. SQ entries that meet these 
two conditions can be placed into the writeable state. These SQ entries are placed into 
the writeable state in program order at a maximum rate of two entries per cycle. The 
Mbox transfers writeable store queue entry data from the SQ to the Dcache in program 
order at a maximum rate of two entries per cycle. Dcache lines associated with write- 
able store queue entries are locked by the Mbox. System port probe commands cannot 
evict these blocks until their associated writeable SQ entries have been transferred into 
the Dcache. This restriction assists in STx_C instruction and Dcache ECC processing. 


SQ entry data that has not been transferred to the Dcache may source data to newer load 
instructions. The Mbox compares the virtual Dcache index bits of incoming load 
instructions to queued SQ entries, and sources the data from the SQ, bypassing the 
Deache, when necessary. 


2.8.4 I/O Address Space Store Instructions 


The Mbox begins processing I/O space store instructions, like memory space store 
instructions, by translating the virtual address and placing the state associated with the 
store instruction into the SQ. 


The Mbox replays retired I/O space store entries from the SQ to the IOWB in program 
order at a rate of one per GCLK cycle. The Mbox never allows queued I/O space store 
instructions to source data to subsequent load instructions. 


. Compaq Confidential 
2-28 Internal Architecture 21264A Revision 1.1 — Subject To Change 


MAF Memory Address Space Merging Rules 


The Cbox maximizes I/O bandwidth when it allocates a new IOWB entry to an I/O 
store instruction by attempting to merge I/O load instructions in a merge register. Table 
2-8 shows the rules for I/O space store instruction data merging. The columns represent 
the load instructions replayed to the IOWB while the rows represent the size of the store 
in the merge register. 


Table 2-8 Rules for I/O Address Space Store Instruction Data Merging 








Merge Register/ Store 

Repiayed Instruction Byte/Word Store Longword Store Quadword 
Byte/Word No merge No merge No merge 

Longword No merge Merge up to 32 bytes No merge 

Quadword No merge No merge Merge up to 64 bytes 


Table 2—8 shows some of the following rules: 


Byte/word store instructions and different size store instructions are not allowed to 
merge. 


A stream of ascending non-overlapping, but not necessarily consecutive, longwerd 
store instructions are allowed to merge into naturally aligned 32-byte blocks. 


¢ A stream of ascending non-overlapping, but not necessarily consecutive, quadword 
store instructions are allowed to merge into naturally aligned 64-byte blocks. 


Merging of quadwords can be limited to naturally-aligned 32-byte blocks based on 
the Cbox WRITE_ONCE chain 32_BYTE_IO field. 


¢ To minimize latency, the /O register merge window is closed when a timer detects 
no I/O load instruction activity for 1024 cycles. Issued MB, WMB, 
and I/O load instructions also close the merge window. 


After the IOWB merge register has closed its merge window, the Cbox sends I/O space 
store requests offchip in the order that they were received from the Mbox. 


2.9 MAF Memory Address Space Merging Rules 


Because all memory transactions are to 64-byte blocks, efficiency is improved by merg- 
ing several small data transactions into a single larger data transaction. Table 2-9 lists 
the rules the 21264A uses when merging memory transactions into 64-byte naturally 
aligned data block transactions. Rows represent the merged instruction in the MAF and 
columns represent the new issued transaction. 


Table 2-9 MAF Merging Rules 


MAF/New LDx STx STx_C WH64 ECB istream 

LDx Merge — — —_ — — 

STx Merge Merge — —_ — — 

STx_C — — Merge — — — 
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Table 2-9 MAF Merging Rules (Continued) 








MAF/New LDx STx STx_C WH64 ECB istream 
WH64 _ — — Merge — — 

ECB — —_— — — Merge — 
Istream — — _ — = Merge 


In summary, Table 2-9 shows that only like instruction types, with the exception of 
load instructions merging with store instructions, are merged. 


2.10 Instruction Ordering 


In the absence of explicit instruction ordering, such as with MB or WMB instructions, 
the 21264A maintains a default instruction ordering relationship between pairs of load 


‘and store instructions. 


The 21264A maintains the default memory data instruction ordering as shown in 
Table 2—10 (assume address X and address Y are different). 


Table 2-10 Memory Reference Ordering 


First Instruction In Pair 

Load memory to address X 
Load memory to address X 
Store memory to address X 
Store memory to address X 
Load memory to address X 
Load memory to address X 
Store memory to address X 


Store memory to address X 


Second Instruction In Pair 
Load memory to address X 
Load memory to address Y 
Store memory to address X 
Store memory to address Y 
Store memory to address X 
Store memory to address Y 
Load memory to address X 


Load memory to address Y 


Reference Order 
Maintained (litmus test 1) 
Not maintained 
Maintained 

Maintained 

Maintained 

Not maintained 
Maintained 


Not maintained 


The 21264A maintains the default /O instruction ordering as shown in Table 2-11 
(assume address X and address Y are different). 


Table 2-11 /O Reference Ordering 





First Instruction In Pair 


Second Instruction In Pair 


Reference Order 





Load I/O to address X Load I/O to address X 
Load I/O to address X Load I/O to address Y 
Store I/O to address X Store I/O to address X 
Store /O to address X Store I/O to address Y 
Load I/O to address X Store I/O to address X 
Load I/O to address X Store I/O to address Y 
Store I/O to address X Load I/O to address X 
Store I/O to address X Load I/O to address Y 
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Maintained 
Maintained 
Maintained 
Maintained 
Maintained 
Not maintained 
Maintained 


Not maintained 
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2.11 Replay Traps 


There are some situations in which a load or store instruction cannot be executed due to 
a condition that occurs after that instruction issues from the IQ or FQ. The instruction is 
aborted (along with all newer instructions) and restarted from the fetch stage of the 
pipeline. This mechanism is called a replay trap. 


2.11.1 Mbox Order Traps 


Load and store instructions may be issued from the IQ in a different order than they 
were fetched from the Icache, while the architecture dictates that Dstream memory 
transactions to the same physical bytes must be completed in order. Usually, the Mbox 
manages the memory reference stream by itself to achieve architecturally correct 
behavior, but the two cases in which the Mbox uses replay traps to manage the memory 
stream are load-load and store-load order traps. 


2.11.1.1 Load-Load Order Trap 


The Mbox ensures that load instructions that read the same physical byte(s) ultimately 
issue in correct order by using the load-load order trap. The Mbox compares the 
address of each load instruction, as it is issued, to the address of all load instructions in 
the load queue. If the Mbox finds a newer load instruction in the load queue, it invokes 
a load-load order trap on the newer instruction. This is a replay trap that aborts the tar- 
get of the trap and all newer instructions from the machine and refetches instructions 
starting at the target of the trap. 


2.11.1.2 Store-Load Order Trap 


The Mbox ensures that a load instruction ultimately issues after an older store instruc- 

tion that writes some portion of its memory operand by using the store-load order trap. 
The Mbox compares the address of each store instruction, as it is issued, to the address 
of all load instructions in the load queue. If the Mbox finds a newer load instruction in 

the load queue, it invokes a store-load order trap on the load instruction. This is a replay 
trap. It functions like the load-load order trap. 


The Ibox contains extra hardware to reduce the frequency of the store-load trap. There 
is a 1-bit by 1024-entry VPC-indexed table in the Ibox called the stWait table. When an 
Icache instruction is fetched, the associated stWait table entry is fetched along with the 
Icache instruction. The stWait table produces 1 bit for each instruction accessed from 
the Icache. When a load instruction gets a store-load order replay trap, its associated bit 
in the stWait table is set during the cycle that the load is refetched. Hence, the trapping 
load instruction’s stWait bit will be set the next time it is fetched. 


The IQ will not issue load instructions whose st Wait bit is set while there are older unis- 
sued store instructions in the queue. A load instruction whose stWait bit is set can be 
issued the cycle immediately after the last older store instruction is issued from the 
queue. All the bits in the stWait table are unconditionally cleared every 16384 cycles, or 
every 65536 cycles if I CTL[ST_WAIT_64K] is set. 
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2.11.2 Other Mbox Replay Traps 


The Mbox also uses replay traps to control the flow of the load queue and store queue, 
and to ensure that there are never multiple outstanding misses to different physical 

addresses that map to the same Dcache or Beache line. Unlike the order traps, however, 
these replay traps are invoked on the incoming instruction that triggered the condition. 


2.12 I/O Write Buffer and the WMB Instruction 


The I/O write buffer (IOWB) consists of four 64-byte entries with the associated 
address and control logic used to buffer I/O write data between the store queue (SQ) 
and the system port. 


2.12.1 Memory Barrier (MB/WMB/TB Fill Flow) 


The Cbox CSR SYSBUS_MB_ENABLE bit determines if MB instructions produce 
external system port transactions. When the SYSBUS_MB_ENABLE bit equals 0, the 
Cbox CSR MB_CNT[3:0] field contains the number of pending uncommitted transac- 
tions. The counter will increment for each of the following commands: 


¢ RdBlk, RdBlkMod, RdBikI 

e RdBilkSpec (valid), RdBIkModSpec (valid), RdBlkSpecI (valid) 
e = RdBlkVic, RdBIkModVic, RdBlkVicl 

e CleanToDirty, SharedToDirty, STChangeToDirty, InvalToDirty 
° FetchBik, FetchBlkSpec (valid), Evict 

e RdByte, RdLw, RdQw, WrByte, WrLW, WrQW 


The counter is decremented with the C (commit) bit in the Probe and SysDc commands 
(see Section 4.7.7). Systems can assert the C bit in the SysDc fill response to the com- 
mands that originally incremented the counter, or attached to the last probe seen by that 
command when it reached the system serialization point. If the number of uncommitted 
transactions reaches 15 (saturating the counter), the Cbox will stall MAF and IOWB 
processing until at least one of the pending transactions has been committed. Probe pro- 
cessing is not interrupted by the state of this counter. 


2.12.1.1 MB Instruction Processing 


When an MB instruction is fetched in the predicted instruction execution path, it stalls 
in the map stage of the pipeline. This also stalls all instructions after the MB, and con- 
trol of instruction flow is based upon the value in Cbox CSR SYSBUS_MB_ENABLE 


as follows: 


¢ If Cbox CSR SYSBUS_MB_ENABLE is clear, the Cbox waits until the IQ is 
empty and then performs the following actions: 


Sends all pending MAF and IOWB entries to the system port. 


b. Monitors Cbox CSR MB_CNT{3:0], a 4-bit counter of outstanding committed 
events. When the counter decrements from one to zero, the Cbox marks the 
youngest probe queue entry. 


c. Waits until the MAF contains no more Dstream references and the SQ, LQ, and 
IOWB are empty. 
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When all of the above have occurred and a probe response has been sent to the sys- 
tem for the marked probe queue entry, instruction execution continues with the 
instruction after the MB. 


If Cbox CSR SYSBUS_MB_ENABLE is set, the Cbox waits until the IQ is empty 
and then performs the following actions: 


Sends all pending MAF and IOWB entries to the system port 
b. Sends the MB command to the system port 


c. Waits until the MB command is acknowledged, then marks the youngest entry 
in the probe queue 


d. Waits until the MAF contains no more Dstream references and the SQ, LQ, and 
IOWB are empty 


When all of the above have occurred and a probe response has been sent to the sys- 
tem for the marked probe queue entry, instruction execution continues with the 
instruction after the MB. 


Because the MB instruction is executed speculatively, MB processing can begin 
and the original MB can be killed. In the internal acknowledge case, the MB may 
have already been sent to the system interface, and the system is still expected to 
respond to the MB. . 


2.12.1.2 WMB Instruction Processing 


Write memory barrier (WMB) instructions are issued into the Mbox store-queue, where 
they wait until they are retired and all prior store instructions become writeable. The 
Mbox then stalls the writeable pointer and informs the Cbox. The Cbox closes the 
IOWB merge register and responds in one of the following two ways: 


If Cbox CSR SYSBUS_MB_ENABLE is clear, the Cbox performs the following 
actions: 


Stalls further MAF and IOWB processing. 


b. Monitors Cbox CSR MB_CNT{[3:0], a 4-bit counter of outstanding committed 
events. When the counter decrements from one to zero, the Cbox marks the 
youngest probe queue entry. 


c. When a probe response has been sent to the system for the marked probe queue 
entry, the Cbox considers the WMB to be satisfied. 


If Cbox CSR SYSBUS_MB_ENABLE iis set, the Cbox performs the following 
actions: 


a. Stalls further MAF and IOWB processing. 
b. Sends the MB command to the system port. 


c. Waits until the MB command is acknowledged by the system with a SysDc 
MBDone command, then sends acknowledge and marks the youngest entry in 
the probe queue. 


d. When a probe response has been sent to the system for the marked probe queue 
entry, the Cbox considers the WMB to be satisfied. 
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2.12.1.3 TB Fill Flow 


Load instructions (HW_LDs) to a virtual page table entry (VPTE) are processed by the 
21264A to avoid litmus test problems associated with the ordering of memory transac- 
tions from another processor against loading of a page table entry and the subsequent 
virtual-mode load from this processor. 


Consider the sequence shown in Table 2—12. The data could be in the Bcache. Pj should 
fetch datai if it is using PTEi. 


Table 2-12 TB Fill Flow Example Sequence 1 





Pi Pj 

Write Datai Load/Store datai 

MB <TB miss> 

Write PTEi Load-PTE 
<write TB> 


Load/Store (restart) 


Also consider the related sequence shown in Table 2—13. In this case, the data could be 
cached in the Bcache; Pj should fetch datai if it is using PTE. 


Table 2-13 TB Fill Flow Example Sequence 2 


Pi Pj 
Write Datai Istream read datai 
MB <TB miss> 
Write PTEi Load-PTE 
<write TB> 


Istream read (restart) - will miss the Icache 


The 21264A processes Dstream loads to the PTE by injecting, in hardware, some mem- 
ory barrier processing between the PTE transaction and any subsequent load or store 
instruction. This is accomplished by the following mechanism: 


1. The integer queue issues a HW_LD instruction with VPTE. 


2. The integer queue issues a HW_MTPR instruction with a DTB_PTEO, that is data- 
dependent on the HW_LD instruction with a VPTE, and is required in order to fill 
the DTBs. The HW_MTPR instruction, when queued, sets IPR scoreboard bits [4] 
and [0}. 


3. When aHW_MTPR instruction with a DTB_PTE0 is issued, the Ibox signals the 
Cbox indicating that a HW_LD instruction with a VPTE has been processed. This 
causes the Cbox to begin processing the MB instruction. The Ibox prevents any ~ 
subsequent memory operations being issued by not clearing the IPR scoreboard bit 
[0]. IPR scoreboard bit [0] is one of the scoreboard bits associated with the 
HW_MTPR instruction with DTB_PTEO. 


4. When the Cbox completes processing the MB instruction (using one of the above 
sequences, depending upon the state of SYSBUS_MB_ENABLE), the Cbox sig- 
nals the Ibox to clear IPR scoreboard bit [0]. 
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The 21264A uses a similar mechanism to processes Istream TB misses and fills to the 
PTE for the Istream. 


1. The integer queue issues a HW_LD instruction with VPTE. 


2. The IQ issues a HW_MTPR instruction with an ITB_PTE that is data-dependent 
upon the HW_LD instruction with VPTE. This is required in order to fill the ITB. 
The HW_MTPR instruction, when queued, sets IPR scoreboard bits [4] and [0]. 


3. The Cbox issues a HW_MTPR instruction for the ITB_PTE and signals the Ibox 
that a HW_LD/VPTE instruction has been processed, causing the Cbox to start pro- 
cessing the MB instruction. The Mbox stalls Ibox fetching from when the HW_LD/ 
VPTE instruction finishes until the probe queue is drained. 


4. When the 21264A is finished (SYS_MB selects one of the above sequences), the 
Cbox directs the Ibox to clear IPR scoreboard bit [0]. Also, the Mbox directs the 
Ibox to start prefetching. 


Inserting MB instruction processing within the TB fill flow is only required for multi- 
processor systems. Uniprocessor systems can disable MB instruction processing by 
deasserting Ibox CSR I_CTL[TB_MB_EN]. 


2.13 Performance Measurement Support—Performance Counters 


The 21264A provides hardware support for two methods of obtaining program perfor- 
mance feedback information. The two methods do not require program modification. 
The first method offers similar capabilities to earlier microprocessor performance 
counters. The second method supports the new ProfileMe way of statistically sampling 
individual instructions during program execution to develop a model of program execu- 
tion. Both methods use the same hardware registers. 


See Section 6.10 for information about counter control. 


2.14 Floating-Point Control Register 


The floating-point control register (FPCR) is shown in Figure 2-11. 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change Internal Architecture 2-35 


Floating-Point Control Register 


Figure 2-11 Floating-Point Control Register 


SUM 
INED 
UNFD 
UNDZ 
DYN 
IOV 
INE 
UNF 
OVF 
DZE 
INV 
OVFD 
DZED 
INVD 
DNZ 


63 62 61 60 59 58 57 5655 54 5352 51 50 49 48 47 


LK99-0050A 


The floating-point control register fields are described in Table 2—14. 


Table 2-14 Floating-Point Control Register Fields 


Name | 


SUM 
INED 


UNFD 


UNDZ 


2-36 


Extent 
[63] 
[62] 


[61] 


[60} 


Type 
RW 
RW 


RW 


RW 


Internal Architecture 


Description 
Summary bit. Records bit-wise OR of FPCR exception bits. 


Inexact Disable. If this bit is set and a floating-point instruction which enables 
trapping On inexact results generates an inexact value, the result is placed in the 
destination register and the trap is suppressed. 


Underflow Disable. The 21264A hardware cannot generate IEEE compliant 
denormal results. UNFD is used in conjunction with UNDZ as follows: 


UNFD UNDZ Result 


0 x Underflow trap. 
1 0 Trap to supply a possible denormal result. 
l 1 Underflow trap suppressed. Destination is written with a 


true zero (+0.0). 


Underflow to zero. When UNDZ is set together with UNFD, underflow traps 
are disabled and the 21264A places a true zero in the destination register. See 
UNFD, above. 
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Table 2-14 Floating-Point Control Register Fields (Continued) 








Name Extent Type Description 

DYN [59:58] RW Dynamic rounding mode. Indicates the rounding mode to be used by an IEEE 
floating-point instruction when the instruction specifies dynamic rounding 
mode: 


Bits Meaning 


00 Chopped 
Ol Minus infinity 
10 Normal 


1] Plus infinity 


IOV [57] RW Integer overflow. An integer arithmetic operation or a conversion from float- 
ing-point to integer overflowed the destination precision. 


INE [56] RW Inexact result. A floating-point arithmetic or conversion operation gave a result 
that differed from the mathematically exact result. 


UNF [55] RW Underflow. A floating-point arithmetic or conversion operation gave a result 
that underflowed the destination exponent. 


OVF [54] RW Overflow. A floating-point arithmetic or conversion operation gave a result that 
overflowed the destination exponent. 


DZE {53] RW Divide by zero. An attempt was made to perform a floating-point divide with a 
divisor of zero. 


INV [52] RW Invalid operation. An attempt was made to perform a floating-point arithmetic 
operation and one or more of its operand values were illegal. 


OVFD (51) RW Overflow disable. If this bit is set and a floating-point arithmetic operation gen- 
erates an overflow condition, then the appropriate IEEE nontrapping result is 
placed in the destination register and the trap is suppressed. 


DZED [50] RW Division by zero disable. If this bit is set and a floating-point divide by zero 1s 
detected, the appropriate IEEE nontrapping result is placed in the destination 
register and the trap is suppressed. 


INVD [49] RW Invalid operation disable. If this bit is set and a floating-point operate generates 
an invalid operation condition and 21264A is capable of producing the correct 
IEEE nontrapping result, that result is placed in the destination register and the 
trap is suppressed. 


DNZ [48] RW Denormal operands to zero. If this bit is set, treat all Denormal 
operands as a signed zero value with the same sign as the Denormal operand. 


Reserved [47:0]! — = 
1 Alpha architecture FPCR bit 47 (DNOD) is not implemented by the 21264A. 


2.15 AMASK and IMPLVER Instruction Values 


The AMASK and IMPLVER instructions return processor type and supported architec- 
ture extensions, respectively. 
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2.15.1 AMASK 


The 21264A returns the AMASK instruction values provided in Table 2-15. The 
I_CTL register reports the 21264A pass level (see I_CTL[CHIP_ID], Section 5.2.15). 


Table 2-15 21264A AMASK Values 
21264A Pass Level AMASK Value Returned 
See I_CTL[CHIP_ID], Table 5-11 307 16 





The AMASK bit definitions provided in Table 2-15 are defined in Table 2-16. 


Table 2-16 AMASK Bit Assignments 
Bit Meaning 


0 Support for the byte/word extension (BWX) 
The instructions that comprise the BWX extension are LDBU, LDWU, SEXTB, 
SEXTW, STB, and STW. 


l Support for the square-root and floating-point convert extension (FIX) 
The instructions that comprise the FIX extension are FTOIS, FTOIT, ITOFF, ITOFS, 
ITOFT, SQRTF, SQRTG SQRTS, and SQRTT. 


2 Support for the count extension (CIX) 
The instructions that comprise the CIX extension are CTLZ, CTPOP, and CTTZ. 


8 Support for the multimedia extension (MVI) 
The instructions that comprise the MVI extension are MAXSB8, MAXSW4, 
MAXUB8, MAXUW4, MINSB8, MINSW4, MINUB8, MINUW4, PERR, PKLB, 
PKWB, UNPKBL, and UNPKBW. 


9 Support for precise arithmetic trap reporting in hardware. The trap PC is the same as 
the instruction PC after the trapping instruction is executed. 


2.15.2 IMPLVER 
For the 21264A, the IMPLVER instruction returns the value 2. 


2.16 Design Examples 


The 21264A can be designed into many different uniprocessor and multiprocessor sys- 
tem configurations. Figures 2-12 and 2-13 illustrate two possible configurations. 
These configurations employ additional system/memory controller chipsets. 


Figure 2-12 shows a typical uniprocessor system with a second-level cache. This sys- 
tem configuration could be used in standalone or networked workstations. 
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Figure 2-12 Typical Uniprocessor Configuration 
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Figure 2—13 shows a typical multiprocessor system, each processor with a second-level 
cache. Each interface controller must employ a duplicate tag store to maintain cache 
coherency. This system configuration could be used in a networked database server 
application. 
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21272 Core 
212644 Logic Chipset 
Control 
Chip 
21264A Data Slice 
ore ed — 
cia Oe ee ae 
Host PC! Host PCI 
Bridge Chip Bridge Chip 





64-bit PCI i 


64-bit < sabi PClBus___ > < sabi PClBus___ > 
FM-05574-EV67 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change Internal Architecture 2-39 


3 


Hardware Interface 





This chapter contains the 21264A microprocessor logic symbol and provides informa- 
tion about signal names, their function, and their location. This chapter also describes 
the mechanical specifications of the 21264A. It is organized as follows: 


e The 21264A logic symbol 

e The 21264A signal names and functions 

¢ Lists of the signal pins, sorted by name and PGA location 
e The specifications for the 21264A mechanical package 

¢ The top and bottom views of the 21264A pinouts 


3.1 21264A Microprocessor Logic Symbol 
Figure 3-1 show the logic symbol for the 21264A chip. 
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Figure 3-1 21264A Microprocessor Logic Symbol 
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3.2 21264A Signal Names and Functions 


Table 3-1 defines the 21264A signal types referred to in this section. 


Table 3-1 Signal Pin Types Definitions 


Signal Type 
Inputs 
I_DC_REF 
IDA 
I_DA_CLK 
Outputs 
O_OD 
O_OD_TP 
O_PP 
O_PP_CLK 
Bidirectional 
B_DA_OD 
B_DA_PP 


Definition 


Input DC reference pin 
Input differential amplifier receiver 


Input clock pin 


Open drain output driver 
Open drain driver for test pins 
Push/pull output driver 


Push/pull output clock driver 


Bidirectional differential amplifier receiver with open drain output 


Bidirectional differential amplifier receiver with push/pull output 


Other 


Reserved to COMPAQ! 


Spare 


NoConnect 


| 


No connection — Do not connect to these pins for any revision of the 
21264A. These pins must float. 


All Spare connections are Reserved to COMPAQ to maintain compatibility between 


passes of the chip. Designers should not use these pins. 


Table 3-2 lists all signal pins in alphabetic order and provides a full functional descrip- 
tion of the pins. Table 3—4 lists the signal pins and their corresponding pin grid array 
(PGA) locations in alphabetic order for the signal type. Table 3—5 lists the pin grid array 
locations in alphabetical order. 


Table 3-2 21264A Signal Descriptions 








Signal Type Count Description 

BcAdd_H[23:4] O_PP 20 These signals provide the index to the Bcache. 

BcCheck_H[15:0] B_DA_PP 16 ECC check bits for BeData_H[127:0). 

BcData_H[127:0] B_DA_PP 128 Bcache data signals. 

BcDataInClk_H[7:0] IDA 8 Bcache data input clocks. These clocks are used with high 
speed SDRAMs, such as DDRs, that provide a clock-out with 
data-output pins to optimize Bcache read bandwidths. The 
21264A internally synchronizes the data to its logic with clock 
forward receive circuits similar to the system interface. 

BcDataOE_L O_PP 1 Bcache data output enable. The 21264A asserts this signal dur- 


ing Bcache read operations. 
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Table 3-2 21264A Signal Descriptions (Continued) 








Signal Type Count Description 
BcDataOutClk_H[3:0] O_PP 8 Bcache data output clocks. These free-running clocks are dif- 
BcDataOutClk_L[3:0] ferential copies of the Bcache clock and are derived from the 


21264A GCLK. Their period is a multiple of the GCLK and is 
fixed for all operations. They can be configured so that their 
rising edge lags BeAdd_H[23:4] by 0 to 2 GCLK cycles. The 
21264A synchronizes tag output information with these clocks. 


BcDataWr_L O_PP 1 Bcache data write enable. The 21264A asserts this signal when 
writing data to the Bcache data arrays. 

BcLoad_L O_PP 1 Bcache burst enable. 

BcTag_H[42:20] B_DA_PP 23 Bcache tag bits. 

BcTagDirty_H B_DA_PP 1 Tag dirty state bit. During cache write operations, the 21264A 
will assert this signal if the Bcache data has been modified. 

BceTagInCik_H I_DA 1 Bcache tag input clock. The 21264A uses this input clock to 


latch the tag information on Bcache read operations. This clock 
is used with high-speed SDRAMs, such as DDRs, that provide 
a clock-out with data-output pins to optimize Bcache read 
bandwidths. The 21264A internally synchronizes the data to its 
logic with clock forward receive circuits similar to the system 


interface. 

BcTagOE_L O_PP ] Bcache tag output enable. This signal is asserted by the 
21264A for Bcache read operations. 

BcTagOutClk_H O_PP 2 Bcache tag output clock. These clocks “echo” the clock-for- 

BcTagOutClik_L warded BcDataOutClk_x[3:0] clocks. | 

BcTagParity_H B_DA_PP 1 Tag parity state bit. 

BcTagShared_H B_DA_PP 1 Tag shared state bit. The 21264A will write a | on this signal 
line if another agent has a copy of the cache line. 

BcTagValid_H B_DA_PP 1 Tag valid state bit. If set, this line indicates that the cache line 
is valid. 

BcTagWr_L O_PP 1 Tag RAM write enable. The 21264A asserts this signal when 
writing a tag to the Bcache tag arrays. 

BcVref I_DC_REF 1 Bcache tag reference voltage. 

CikFwdRst_H IDA ] Systems assert this synchronous signal to wake up a powered- 


down 21264A. The ClkFwdRst_H signal is clocked into a 
21264A register by the captured FrameClk_x signals. Sys- 
tems must ensure that the timing of this signal meets 21264A 
requirements (see Section 4.7.2). 


CikIn_H IDA_CLK 2 Differential input signals provided by the system. 

CikIn_L 

DCOK_H LDA 1 dc voltage OK. Must be deasserted until de voltage reaches 
proper operating level. After that, DCOK_H is asserted. 

EV6Cik_H O_PP_CLK 2 Provides an external test point to measure phase alignment of 

EV6Cik_L the PLL. 
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Table 3-2 21264A Signal Descriptions (Continued) 








Signal Type Count Description 

FrameClik_H IDA_CLK 2 A skew-controlled differential 50% duty cycle copy of the sys- 

FrameClk_L tem clock. It is used by the 21264A as a reference, or framing, 
clock. 

IRQ_Hf[5:0] IDA 6 These six interrupt signal lines may be asserted by the system. 
The response of the 21264A is determined by the system soft- 
ware. 

MiscVref IDC_REF 1 Voltage reference for the miscellaneous pins 
(see Table 3-3). 

PliBypass_H LDA 1 When asserted, this signal will cause the two input clocks 
(CikIn_x) to be applied to the 21264A internal circuits, instead 
of the 21264A global clock (GCLK). 

PLL_VDD 3.3.V 1 3.3-V dedicated power supply for the 21264A PLL. 

Reset_L I_DA 1 System reset. This signal protects the 21264A from damage 
during initial power-up. It must be asserted until DCOK_H is 
asserted. After that, it is deasserted and the 21264A begins its 
reset sequence. 

SromClik_H O_OD_TP 1 Serial ROM clock. Supplies the clock that causes the SROM ta 
advance to the next bit. The cycle time for this clock is 256 
times the cycle time of the GCLK (internal 21264A clock). 

SromData_H LDA ] Serial ROM data. Input data line from the SROM. 

SromOE_L O_OD_TP 1 Serial ROM enable. Supplies the output enable to the SROM. 

SysAddIn_L[14:0] IDA 15 Time-multiplexed command/address/ID/Ack from system to 

: the 21264A. 

SysAddInClik_L IDA l Single-ended forwarded clock from system for - 
SysAddIn_L[14:0] and SysFill Valid_L. 

SysAddOut_L[14:0] O_OD 15 Time-multiplexed command/address/ID/mask from the 
21264A to the system bus. 

SysAddOutClik_L O_OD 1 Single-ended forwarded clock output for 
SysAddOut_L[14:0}. 

SysCheck_L[7:0] B_DA_OD 8 Quadword ECC check bits for SysData_L[63:0]. 

SysData_L[63:0] B_DA_OD 64 Data bus for memory and I/O data. 

SysDataInClik_H[7:0] I_DA 8 Single-ended system-generated clocks for clock forwarded 
input system data. 

SysDatalInValid_L LDA 1 When asserted, marks a valid data cycle for data transfers to 
the 21264A. 

SysDataOutClk_L[7:0] O_OD 8 Single-ended 21264A-generated clocks for clock forwarded 
output system data. 

SysDataOutValid_L IDA 1 When asserted, marks a valid data cycle for data transfers from 
the 21264A. 

SysFillValid_L IDA 1 When asserted, this bit indicates validation for the cache fill 
delivered in the previous system SysDc command. 

SysVref IDC_REF 1 System interface reference voltage. 
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Table 3-2 21264A Signal Descriptions (Continued) 








Signal Type Count Description 

Tck_H LDA | IEEE 1149.1 test clock. 

Tdi_H I_LDA 1 IEEE 1149.1 test data-in signal. 

Tdo_H O_OD_TP 1 IEEE 1149.1 test data-out signal. 

TestStat_H O_OD_TP 1 Test status pin. System reset drives the test status pin low. 


The TestStat_H pin is forced high at the start of the Icache 
BiST. If the Icache BiST passes, the pin is deasserted at the end 
of the BiST operation; otherwise, it remains high. 

The 21264A generates a timeout reset signal if an instruction is 
not retired within one billion cycles. 

The 21264A signals the timeout reset event by outputting a 256 
GCLK cycle wide pulse on TestStat_H. 


Tms_H I_LDA 1 IEEE 1149.1 test mode select signal. 
Trst_L IDA ] IEEE 1149.1 test access port (TAP) reset signal. 


Table 3-3 lists signals by function and provides an abbreviated description. 


Table 3-3 21264A Signal Descriptions by Function 











Signal Type Count Description 
BeVref Domain 
BcAdd_H[23:4] O_PP 20 Beache index. 
BcCheck_H[15:0] B_DA_PP 16 ECC check bits for BcData_H[127:0]. 
BcData_H[127:0] B_DA_PP 128 Bcache data. 
BcDatalInClk_H[7:0] I_DA 8 Bcache data input clocks. 
BcDataOE_L O_PP 1 Bcache data output enable. 
BcDataOutClk_H[3:0] O_PP 8 Bcache data output clocks. 
BcDataOutClk_L[3:0] 
BcDataWr_L O_PP 1 Bcache data write enable. 
BcLoad_L O_PP 1 Bcache burst enable. 
BceTag_H[42:20] B_DA_PP 23 Bcache tag bits. 
BcTagDirty_H B_DA_PP 1 Tag dirty state bit. 
BcTagInClk_H I_DA ] Bcache tag input clock. 
BcTagOE_L O_PP 1 Bcache tag output enable. 
BcTagOutClk_H O_PP 2 Bcache tag output clocks. 
BcTagOutClik_L . 
BcTagParity_H B_DA_PP ] Tag parity state bit. 
BcTagShared_H B_DA_PP 1 Tag shared state bit. 
BcTagValid_H B_DA_PP 1 Tag valid state bit. 
BcTagWr_L O_PP 1. Tag RAM write enable. 
BeVref IDC_REF 1 Tag data input reference voltage. 
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Table 3-3 21264A Signal Descriptions by Function (Continued) 











Signal Type Count Description 

SysVref Domain 

SysAddIn_L[14:0] LDA 15 Time-multiplexed SysAddIn, system-to-21264A. 

SysAddInClk_L IDA ] Single-ended forwarded clock from system for 
SysAddIn_L[14:0] and SysFillValid_L. 

SysAddOut_L[14:0] O_OD 15 Time-multinlexed SvsAddOut, 21264A-to-system. 

SysAddOutClk_L O_OD 1 Single-ended forwarded-clock. 

SysCheck_L[7:0] B_DA_OD 8 Quadword ECC check bits for SysData_L[63:0]. 

SysData_L[63:0] B_DA_OD 64 Data bus for memory and I/O data. 

SysDataInClk_H[7:0] I_DA 8 Single-ended system-generated clocks for clock forwarded 


input system data. 





SysDataInValid_L IDA 1 When asserted, marks a valid data cycle for data transfers to 
the 21264A. 

SysDataOutClk_L[7:0] O_OD 8 Single-ended 21264A-generated clocks for clock forwarded . 
output system data. 

SysDataOutValid_L IDA 1 When asserted, marks a valid data cycle for data transfers 
from the 21264A. 

SysFill Valid_L IDA 1 Validation for fill given in previous SysDC command. 

SysVref I_DC_REF 1 System interface reference voltage. 

Clocks and PLL 

CikIn_H IDA_CLK 2 Differential input signals provided by the system. 

ClkIn_L 

EV6Clk_H O_PP_CLK 2 Provides an external test point to measure phase alignment of 

EV6CIk_L the PLL. 

FrameClk_H IDA_CLK 2 A skew-controlled differential 50% duty cycle copy of the 

FrameClk_L system clock. It is used by the 21264A as a reference, or fram- 
ing, Clock. 

PLL_VDD 3.3 ¥- 1 3.3-V dedicated power supply for the 21264A PLL. 

MiscVref Domain 

CikFwdRst_H IDA 1 Systems assert this synchronous signal to wake up a powered- 


down 21264A. The ClkFwdRst_H signal is clocked into a 
21264A register by the captured FrameClk_x signals. 


DCOK_H LDA 1 dc voltage OK. Must be deasserted until dc voltage reaches 
proper operating level. After that, DCOK_H is asserted. 


IRQ_H{[5:0] I_DA 6 These six interrupt signal lines may be asserted by the system. 
Misc Vref IDC_REF 1 Reference voltage for miscellaneous pins. 
PllBypass_H IDA 1 When asserted, this signal will cause the input clocks 


(CikIn_x) to be applied to the 21264A internal circuits, 
instead of the 21264A’s global clock (GCLK). 
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Table 3-3 21264A Signal Descriptions by Function (Continued) 








Signal Type Count Description 

Reset_L IDA | System reset. This signal protects the 21264A from damage 
during initial power-up. It must be asserted until DCOK_H is 
asserted. After that, it is deasserted and the 21264A begins its 
reset sequence. 

SromClk_H O_OD_TP 1 Serial ROM clock. 

SromData_H IDA 1 Serial ROM data. 

SromOE_L O_OD_TP 1 Serial ROM enable. 

Tck_H IDA 1 IEEE 1149.1 test clock. 

Tdi_H IDA 1 TEEE 1149.1 test data-in signal. 

Tdo_H O_OD_TP 1 IEEE 1149.1 test data-out signal. 

TestStat_H O_OD_TP 1 Test status pin. 

Tms_H IDA l | IEEE 1149.1 test mode select signal. 

Trst_L IDA 1 IEEE 1149.1 test access port (TAP) reset signal. 


3.3 Pin Assignments 


The 21264A package has 587 pins aligned in a pin grid array (PGA) design. There are 
380 functional signal pins, 1 dedicated 3.3-V pin for the PLL, 112 ground VSS pins, 
and 94 VDD pins. Table 3-4 lists the signal pins and their corresponding pin grid array 
(PGA) locations in alphabetical order for the signal type. Table 3—S lists the pin grid 
array locations in alphabetical order 


Table 3-4 Pin List Sorted by Signal Name 


Signal Name 
BcAdd_H_10 
BcAdd_H_13 
BcAdd_H_16 
BcAdd_H_19 
BcAdd_H_22 
BcAdd_H_5 
BcAdd_H_8 
BcCheck_H_1 
BcCheck_H_12 
BeCheck_H_15 
BcCheck_H_4 
BcCheck_H_7 
BcData_H_0 
BcData_H_100 
BcData_H_103 


PGA Location Signal Name PGA Location Signal Name -PGALocation 


B30 BcAdd_H_1] D30 BcAdd_H_12 C31 
H28 BcAdd_H_14 G29 BcAdd_H_15 A33 
E31 BcAdd_H_17 D32 BcAdd_H_18 B34 
A35 '  BeAdd_H_20 B36 BcAdd_H_21 +H30 
C35 BcAdd_H_23 E33 BcAdd_H_4 B28 
E27 BcAdd_H_6 A29 BcAdd_H_7 G27 
C29 BcAdd_H_9 F28 BcCheck_H_0 F2 
AB4 BcCheck_H_10 AW1 BcCheck_H_11 BD10 
E45 BcCheck_H_13 AC45 BcCheck_H_14 AT44 
BB36 BcCheck_H_2 AT2 BcCheck_H_3 BCI 
M38 BcCheck_H_5 AB42 BcCheck_H_6 AU43 
BC37 BcCheck_H_8 M8 BcCheck_H_9 AA3 
B10 BcData_H_1l D10 BcData_H_10 L3 
D42 BcData_H_101 D44 BcData_H_102 H40 
H42 BcData_H_104 G45 BcData_H_105 L43 
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Signal Name PGA Location Signal Name PGA Location Signal Name PGALocation 
BcData_H_106 L45 BcData_H_107 N45 BcData_H_108 T44 
BcData_H_109 U45 BcData_H_11 M2 BcData_H_110 W45 
BcData_H_111 AA43 BcData_H_112 AC43 BcData_H_113 AD44 
BcData_H_114 AF41 BcData_H_115 AG45 BcData_H_116 AK44 
BcData_H_117 ALA3 BcData_H_118 AM42 BcData_H_119 AR45 
BcData_H_12 T2 BcData_H_120 AP40 BcData_H_121 BA45 
BcData_H_122 AV42 BcData_H_123 BB44 BcData_H_124 BB42 
BcData_H_125 BC41 BcData_H_126 BA37 BcData_H_127 BD40 
BcData_H_13 Ul BcData_H_14 V2 BcData_H_15 Y4 
BcData_H_16 ACI BcData_H_17 AD2 BcData_H_18 AE3 
BcData_H_19 AG1 BcData_H_2 AS BcData_H_20 AK2 
BcData_H_21 AL3 BcData_H_22 ARI BcData_H_23 AP2 
BcData_H_24 AY2 BcData_H_25 BB2 BcData_H_26 AWS 
BcData_H_27 BB4 BcData_H_28 BB8 BcData_H_29 BES 
BcData_H_3 C5 BcData_H_30 BB10 - BcData_H_31 BE7 
BcData_H_32 G33 BcData_H_33 C37 BcData_H_34 B40 
BcData_H_35 C41 BcData_H_36 C43 BcData_H_37 E43 
BeData_H_38 Gal BcData_H_39 F44 BcData_H_4 C3 
BcData_H_40 K44 BcData_H_41 N41 BcData_H_42 M44 
BcData_H_43 P42 BcData_H_44 U43 BcData_H_45 V44 
BcData_H_46 Y42 BcData_H_47 AB44 BcData_H_48 AD42 
BcData_H_49 AE43 BcData_H_5 E3 BcData_H_50 AF42 
BcData_H_51 AJ45 BcData_H_52 AK42 BcData_H_53 AN45 
BcData_H_54 AP44 BcData_H_55 AN41 BcData_H_56 AW45 
BcData_H_57 AU41 BcData_H_58 AY44 BcData_H_59 BA43 
BcData_H_6 H6 BcData_H_60 BC43 BcData_H_61 BD42 
BcData_H_62 BB38 BcData_H_63 BE41 BcData_H_64 Cll 
BcData_H_65 A7 BcData_H_66 C9 BcData_H_67 B6 
BcData_H_68 B4 BcData_H_69 D4 BcData_H_7 El 
BcData_H_70 G5 BcData_H_71 D2 BcData_H_72 H4 
BcData_H_73 Gl BcData_H_74 N5 BcData_H_75 Ll 
BcData_H_76 Nl BcData_H_77 U3 BcData_H_78 W5 
BcData_H_79 Wi BcData_H_8 J3 BcData_H_80 AB2 
BcData_H_81 AC3 BcData_H_82 AD4 BcData_H_83 AF4 
BcData_H_84 AJ3 BcData_H_85 AK4 BcData_H_86 AN1 
BcData_H_87 AM4 BcData_H_88 AUS5 BcData_H_89 BA] 
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Table 3-4 Pin List Sorted by Signal Name (Continued) 





Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location 





BcData_H_9 K2 BcData_H_90 BA3 BcData_H_91 BC3 
BcData_H_92 BD6 BcData_H_93 BA9 BcData_H_94 BC9 
BcData_H_95 AY12 BcData_H_96 A39 BcData_H_97 D36 
BcData_H_98 A4} BcData_H_99 B42 BcDataInClk_H_0 = _—E7 


BcDataInClk_H_1 R3 
BcDatalnClk_H_4  F38 
BcDataInClk_H_7 AY40 
BcDataOutClk_H_1 AU3 
BcDataOutClk_L_0 K4 


BcDataInClk_H_2 AH2 
BcDataInClk_H_5 U39 
BcDataOE_L A27 
BcDataOutClk_H_2 J43 

BcDataOutClIk_L_1 AV4 


BcDataInClk_H_3 BCS 
BcDataInClk_H_6 AH44 
BcDataOutClk_H_0 J5 
BcDataOutClk_H_3 AR43 
BcDataOutCik_L_2 K42 


BcDataOutClk_L_3 AT42 BcDataWr_L D26 BcLoad_L F26 
BcTag_H_20 E13 BcTag_H_21 H16 BcTag_H_22 All 
BcTag_H_23 B12 BcTag_H_24 D4 BcTag_H_25 E15 
BcTag_H_26 Al3 BcTag_H_27 G17 BcTag_H_28 G15 
BcTag_H_29 H18 BcTag_H_30 D16 BcTag_H_31 B16 
BcTag_H_32 C17 BcTag_H_33 Al7 BcTag_H_34 E19 
BcTag_H_35 B18 BcTag_H_36 Al9 BcTag_H_37 F20 
BcTag_H_38 D20 BcTag_H_39- E21 BcTag_H_40 G3 | 
BcTag_H_41 D22 BcTag_H_42 H22 BcTagDirty_H C23 
BcTaginClk_H G19 BcTagOE_L H24 BcTagOutClk_H C25 
BcTagOutClk_L D24 BcTagParity_H B22 BcTagShared_H G23 
BcTagValid_H B24 BcTagWr_L E25 BcVref F18 
CikFwdRst_H BE]1 ClkIn_H AM8 CikIn_L AN7 
DCOK_H AY18 EV6Clk_H AM6 EV6CIk_L AL7 
FrameClk_H AV16 FrameClik_L AWI1S IRQ_H_0 BAIS 
IRQ_H_1 BE13 IRQ_H_2 AW17 IRQ_H_3 AV18 
IRQ_H_4 BC15 IRQ_H_S BB16 MiscVref AV22 
NoConnect BB14 NoConnect BD2 PLL_VDD AV8 
PllBypass_H BD12 Reset_L BD16 Spare AJ] 
Spare V38 Spare AT4 Spare BE9 
Spare F8 Spare BD4 Spare AJ43 
Spare ~ AR3 Spare T4 ' Spare E39 
Spare BA39 Spare BC21 SromClk_H AW19 
SromData_H BC17 SromOE_L BE17 SysAddIin_L_0 BD30 
SysAddIn_L_1 BC29 SysAddIn_L_10 BB24 SysAddIn_L_1l1 AV24 
SysAddIn_L_12 | BD24 SysAddIn_L_13 BE23 SysAddIn_L_14 AW23 
SysAddIn_L_2 AY28 SysAddIn_L_3 BE29 SysAddin_L_4 AW27 
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Table 3-4 Pin List Sorted by Signal Name (Continued) 


Pin Assignments 








Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location 
BcData_H_106 L45 BcData_H_107 N45 BcData_H_108 T44 
BcData_H_109 U45 BcData_H_1i1 M2 BcData_H_110 W45 
BcData_H_111 AA43 BcData_H_112 AC43 BcData_H_113 AD44 
BcData_H_114 AEF41 BcData_H_115 AG45 BcData_H_116 AK44 
BcData_H_117 ALA3 BcData_H_118 AM42 BcData_H_119 AR45 
BcData_H_12 72 BcData_H_120 AP40 BcData_H_121 BA45 
BcData_H_122 AV42 BcData_H_123 BB44 BcData_H_124 BB42 
BcData_H_125 BC41 BcData_H_126 BA37 BcData_H_127 BD40 
BcData_H_13 Ul BcData_H_14 V2 BcData_H_15 Y4 
BcData_H_16 ACI BcData_H_17 AD2 BcData_H_18 AE3 
BcData_H_19 AGI BcData_H_2 AS BcData_H_20 AK2 
BcData_H_21 AL3 BcData_H_22 ARI] BcData_H_23 AP2 
BcData_H_24 AY2 BcData_H_25 BB2 BcData_H_26 AWS5 
BcData_H_27 BB4 BcData_H_28 BB8 BcData_H_29 BES 
BcData_H_3 C5 BcData_H_30 BB10 - BcData_H_31 BE7 
BcData_H_32 G33 BcData_H_33 C37 BcData_H_34 B40 
BcData_H_35 C41 BcData_H_36 C43 BeData_H_37 E43 
BcData_H_38 G41 BcData_H_39 F44 BcData_H_4 C3 
BcData_H_40 K44 BcData_H_41 N41 BcData_H_42 M44 
BcData_H_43 P42 BcData_H_44 U43 BcData_H_45 V44 
BcData_H_46 Y42 BcData_H_47 AB44 BcData_H_48 AD42 
BcData_H_49 AE43 BcData_H_5 E3 BcData_H_50 AF42 
BcData_H_51 AJ45 BcData_H_52 AK42 BcData_H_53 AN45 
BcData_H_54 AP44 BcData_H_55 AN41 BcData_H_56 AW45 
BcData_H_57 AU41 BcData_H_58 AY44 BcData_H_59 BA43 
BcData_H_6 H6 BcData_H_60 BC43 BcData_H_61 BD42 
BcData_H_62 BB38 BcData_H_63 BE41 BcData_H_64 Cll 
BcData_H_65 A7 BcData_H_66 C9 BcData_H_67 B6 
BcData_H_68 B4 BcData_H_69 D4 BcData_H_7 El 
BcData_H_70 G5 BcData_H_71 D2 BcData_H_72 H4 
BcData_H_73 Gl BcData_H_74 NS5 BcData_H_75 Ll 
BcData_H_76 N1 BcData_H_77 U3 BcData_H_78 WS5 
BcData_H_79 wi BcData_H_8 J3 BcData_H_80 AB2 
BcData_H_81 AC3 BcData_H_82 AD4 BcData_H_83 AF4 
BcData_H_84 AJ3 BcData_H_85 AK4 BcData_H_86 AN1 
BcData_H_87 AM4 BcData_H_88 AUS BcData_H_89 BA] 
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Table 3-4 Pin List Sorted by Signal Name (Continued) 








Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location 
BcData_H_9 K2 BcData_H_90 BA3 BcData_H_91 BC3 
BcData_H_92 BD6 BcData_H_93 BA9 BcData_H_94 BC9 
BcData_H_95 AY12 BcData_H_96 A39 BcData_H_97 D36 
BcData_H_98 A41 BcData_H_99 B42 BcDataInCIk_H_O0 = _E7 


BcDataInCIk_H_2  AH2 
BcDataInClk_H_5 U39 
BcDataOE_L A27 
BcDataOutClk_H_2 J43 

BcDataOutClk_L_1 AV4 


BcDataInClk_H_1 3 
BcDatalnClk_H_4  F38 
BcDataInClk_H_7 AY40 
- BcDataOutClk_H_1 AU3 
BcDataOutClk_L_0 K4 


BcDataInClk_H_3 = BCS 
BcDataInClk_H_6 AH44 
BcDataOutClk_H_0 = J5 
BcDataOutClk_H_3 AR43 
BcDataOutClk_L_2 K42 


BcDataOutClk_L_3 AT42 BcDataWr_L D26 BcLoad_L F26 
BcTag_H_20 E13 BcTag_H_21 H16 BcTag_H_22 All 
BcTag_H_23 B12 BcTag_H_24 D14 BcTag_H_25 E15 
BcTag_H_26 Al3 BcTag_H_27 G17 BcTag_H_28 C15 
BcTag_H_29 H18 BcTag_H_30 D16 BcTag_H_31 B16 
BcTag_H_32 C17 BcTag_H_33 Al7 BcTag_H_34 E19 
BcTag_H_35 BI8 BcTag_H_36 Al9 BcTag_H_37 F20 
BcTag_H_38 D20 BcTag_H_39- E2 1 BcTag_H_40 C21 
BcTag_H_41 D22 BcTag_H_42 H22 BcTagDirty_H C23 
BcTagInCik_H G19 BcTagOE_L H24 BceTagOutCik_H C25 
BcTagOutClk_L D24 BcTagParity_H B22 BcTagShared_H G23 
BcTagValid_H B24 BcTagWr_L E25 BcVref F18 
CikF wdRst_H BE}I1 CikIn_H AM8 CikIn_L AN7 
DCOK_H AY18 EV6Clk_H AM6 EV6CIk_L AL7 
FrameClk_H AV16 FrameClk_L AW15 IRQ_H_0 BAI5 
IRQ_H_1 | BE13 IRQ_H_2 AW17 IRQ_H_3 AV18 
IRQ_H_4 BCI5 IRQ_H_S BB16 MiscVref AV22 
NoConnect BB14 NoConnect BD2 PLL_VDD AV8 
PllBypass_H BD12 Reset_L BD16 Spare AJl 
Spare V38 Spare AT4 Spare BE9 
Spare F8 Spare BD4 Spare AJ43 
Spare AR3 Spare T4 Spare E39 
Spare BA39 Spare BC2]1 SromCik_H AW19 
SromData_H BC17 SromOE_L BE17 SysAddIn_L_0 BD30 
SysAddIn_L_1l BC29 SysAddIn_L_10 BB24 SysAddIn_L_11 AV24 
SysAddIn_L_12 | BD24 SysAddIn_L_13 BE23 SysAddIn_L_14 AW23 
SysAddIn_L_2 AY28 SysAddIn_L_3 BE29 SysAddiIn_L_4 AW27 
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Table 3-4 Pin List Sorted by Signal Name (Continued) 








Signal Name PGA Location Signal Name PGA Location Signal Name PGALocation 
SysAddIn_L_5 BA27 SysAddIn_L_6 BD28 SysAddIn_L_7 BE27 
SysAddIn_L_8 AY 26 SysAddIn_L_9 BC25 SysAddInClk_L BB26 


SysAddOut_L_0 AW33 
SysAddOut_L_11 AW29 
SysAddOut_L_14 BB30 
SysAddOut_L_4 BA33 
SysAddOut_L_7 AV30 
SysAddOutClk_L BD34 


SysAddOut_L_1! BE39 
SysAddOut_L_12 BC31 
SysAddOut_L_2 BD36 
SysAddOut_L_5 AY32 
SysAddOut_L_8 BB32 
SysCheck_L_0 L7 


SysAddOut_L_10 BE33 
SysAddOut_L_13 AV28 
SysAddOut_L_3 BC35 
SysAddOut_L_6 BE35 
SysAddOut_L_9 BA31 
SysCheck_L_1 AAS 


SysCheck_L_2 AK8 SysCheck_L_3 BA13 SysCheck_L_4 139 
SysCheck_L_5 AA41 SysCheck_L_6 AM40 SysCheck_L_7 AY34 
SysData_L_0 F14 SysData_L_1 Gi3 SysData_L_10 P6 
SysData_L_11 T8 SysData_L_12 V8 SysData_L_13 V6 
SysData_L_14 W7 SysData_L_15 Y6 SysData_L_16 ABB 
SysData_L_17 AC7 SysData_L_18 AD8 SysData_L_19 AES 
SysData_L_2 F12 SysData_L_20 AH6 SysData_L_21 AHB 
SysData_L_22 AJ7 SysData_L_23 ALS5 SysData_L_24 APS 
SysData_L_25 AR7 SysData_L_26 AT8 SysData_L_27 AV6 
SysData_L_28 AV10 SysData_L_29 AW11 SysData_L_3 12 
SysData_L_30 AV12 SysData_L_31 AW13 SysData_L_32 F32 
SysData_L_33 F34 SysData_L_34 H34 SysData_L_35 G35 
SysData_L_36 F40 SysData_L_37 G39 SysData_L_38 K38 
SysData_L_39 J41 SysData_L_4 H10 SysData_L_40 M40 
SysData_L_41 N39 SysData_L_42 P40 SysData_L_43 T38 
SysData_L_44 V40 SysData_L_45 W411 SysData_L_46 W39 
SysData_L_47 Y40 SysData_L_48 AB38 SysData_L_49 AC39 
SysData_L_5 G7 SysData_L_50 AD38 SysData_L_S1 AF40 
SysData_L_52 AH38 SysData_L_53 AJ39 SysData_L_54 AL41 
SysData_L_55 AK38 SysData_L_56 AN39 SysData_L_57 AP38 
SysData_L_58 AR39 SysData_L_59 AT38 SysData_L_6 F6 
SysData_L_60 AY38 SysData_L_61 AV36 SysData_L_62 AW35 
SysData_L_63 AV34 SysData_L_7 K8 SysData_L_8 M6 
SysData_L_9 N7 SysDataInClk_H_0 D8 SysDataInClk_H_1 P4 


SysDataInCik_H_2 AF6 
SysDataInClk_H_S5 R43 
SysDataInValid_L BD22 
SysDataOutCik_L_2 AG7 


SysDataInClk_H_3 AY6 
SysDataInClk_H_6 AG41 
SysDataOutClk_L_0 Gil 
SysDataOutClk_L_3 AY8 


SysDataInClk_H_4 £37 
SysDataInCik_H_7 AV40 
SysDataOutCik_L_1 U7 
SysDataOutCik_L_4 H36 
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Table 3—4 Pin List Sorted by Signal Name (Continued) 





Signal Name 


PGA Location Signal Name 


PGA Location Signal Name 


PGALocation 





SysDataOutCik_L_5 R41 
SysDataOutValid_L BB22 


Tck_H 
TestStat_H 


BE19 
BAI9 


SysFillValid_L 
Tdi_H 
Tms_H 


Table 3-5 Pin List Sorted by PGA Location 





PGA Location Signal Name 


All 
Al9 
A33 


BcTag_H_22 
BcTag_H_36 
BcAdd_H_15 
BcData_H_98 
BcCheck_H_9 
SysCheck_L_1 
BcCheck_H_1 
SysData_L_16 
SysData_L_49 
SysData_L_17 
BcData_H_82 
SysData_L_18 
BcData_H_49 
SysData_L_51 
BcData_H_19 


SysDataOutClk_L_2 
SysDataOutClk_L_6 


SysData_L_21 
SysData_L_53 
SysData_L_22 
BcData_H_85 

SysCheck_L_2 


BcData_H_117 


BcData_H_87 
EV6Clk_H 
SysData_L_56 
ClikIn_L 
BcData_H_120 
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_SysDataOutClk_L_6 AH40 


BC23 
BA2]1 
BD18 


PGA Location Signal Name 


Al3 BcTag_H_26 

A27 BcDataOE_L 

A35 BcAdd_H_19 

AS BcData_H_2 

AA41 SysCheck_L_5 

AB2 BcData_H_80 

AB42 BcCheck_H_5 

ACI BcData_H_16 

AC43 BcData_H_112 

AD2 BcData_H_17 

AD42 BcData_H_48 

AE3 BcData_H_18 

AES SysData_L_19 

AF42 BcData_H_50 

AG41 SysDataInClk_H_6 

AH2 BcDataInClk_H_2 

AH44 BcDataInClk_H_6 

AJ} Spare 

AJ43 Spare 

AK2 BcData_H_20 

AK42 BcData_H_52 

AL3 BcData_H_21 

ALS SysData_L_23 

AM40 SysCheck_L_6 

AM8 CikIn_H 

AN4]1 BcData_H_55 

AP2 BcData_H_23 

AP44 BcData_H_54 
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SysDataOutCik_L_7 AW39 


SysVref 
Tdo_H 
Trst_L 


BA25 
BB20 
AY20 


PGA Location Signal Name 


Al7 
A29 
A39 


BcTag_H_33 
BcAdd_H_6 
BcData_H_96 
BcData_H_65 
BcData_H_111 
SysData_L_48 
BcData_H_47 
BcData_H_81 
BcCheck_H_13 
SysData_L_50 
BcData_H_113 
BcData_H_114 
BcData_H_83 
SysDataInClk_H_2 
BcData_H_115 
SysData_L_52 
SysData_L_20 
BcData_H_84 
BcData_H_51 
SysData_L_55 
BcData_H_116 
SysData_L_54 
EV6Cik_L 
BcData_H_118 
BcData_H_86 
BcData_H_53 
SysData_L_57 
SysData_L_24 
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PGA Location Signal Name 








ARI 
AR43 
AT2 
AT42 
AU3 
AUS 
AV16 
AV24 
AV34 
AV40 


AW 13 
AW19 
AW29 
AW39 
AY 12 
AY20 
AY32 
AY40 


BC17 
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BcData_H_22 
BcDataOutClk_H_3 
BcCheck_H_2 
BcDataOutClik_L_3 
BcDataOutClk_H_1l 
BcData_H_88 
FrameClk_H 
SysAddIn_L_}1 
SysData_L_63 
SysDataInClk_H_7 
PLL_VDD 
SysData_L_31] 
SromClk_H 
SysAddOut_L_l1 
SysDataOutClk_L_7 
BcData_H_95 
Trst_L 
SysAddOut_L_5 
BcDataInClk_H_7 
SysDataOutClk_L_3 
BcTag_H_31 
BcTagValid_H 
BcAdd_H_18 
BcData_H_34 
BcData_H_89 
TestStat_H 
SysAddIn_L_5 
SysAddOut_L_4 
BcData_H_59 
BcData_H_30 
BcData_H_25 
SysAddIn_L_10 
SysAddOut_L_8 
BcData_H_27 
BcData_H_28 
SromData_H 


PGA Location Signal Name 


AR3 
AR45 
AT38 
AT44 
AU41 
AV10 
AV18 
AV 28 
AV36 
AV42 
AW1 
AW15 
AW23 
AW33 
AW45 
AY18 
AY26 
AY34 
AY44 
B10 
B18 
B28 
B36 
B42 
BA13 
BA21 
BA3 
BA37 
BA45 
BB14 
BB20 
BB26 
BB36 
BB42 
BC11 
BC21 


Spare 
BcData_H_119 
SysData_L_59 
BcCheck_H_14 
BcData_H_57 
SysData_L_28 
IRQ_H_3 
SysAddOut_L_13 
SysData_L_61 
BcData_H_122 
BcCheck_H_10 
FrameCik_L 
SysAddIn_L_14 
SysAddOut_L_0 
BcData_H_56 
DCOK_H 
SysAddIn_L_8 
SysCheck_L_7 
BcData_H_58 
BcData_H_0 
BceTag_H_35 
BcAdd_H_4 
BcAdd_H_20 
BcData_H_99 
SysCheck_L_3 
Tdi_H 
BcData_H_90 
BcData_H_126 
BcData_H_121 
NoConnect 
Tdo_H 
SysAddInCik_L 
BcCheck_H_15 
BcData_H_124 
BcCheck_H_3 
Spare 
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PGA Location Signal Name 


AR39 
AR7 
AT4 
AT8 
AU43 
AV 12 
AV22 
AV30 


AV6 
AW 11 
AW17 
AW27 
AW35 
AW5 


AY 28 
AY38 


SysData_L_58 
SysData_L_25 
Spare 
SysData_L_26 
BcCheck_H_6 
SysData_L_30 
Misc Vref 
SysAddOut_L_7 
BcDataOutClk_L_1 
SysData_L_27 
SysData_L_29 
IRQ_H_2 
SysAddIn_L_4 
SysData_L_62 
BcData_H_26 
BcData_H_24 
SysAddIn_L_2 
SysData_L_60 
SysDataInClk_H_3 
BcTag_H_23 
BcTagParity_H 
BcAdd_H_10 
BcData_H_68 
BcData_H_67 
TRQ_H_0 
SysVref 
SysAddOut_L_9 
Spare 
BcData_H_93 
IRQ_H_S5 
SysDataOutValid_L 
SysAddOut_L_14 
BcData_H_62 
BcData_H_123 
IRQ_H_4 
SysFillValid_L 
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Table 3-5 Pin List Sorted by PGA Location (Continued) 





PGA Location Signal Name 


PGA Location Signal Name 


PGA Location Signal Name 





BC25 
BC31 
BC41 
BC9 

BD16 
BD22 
BD30 


SysAddiIn_L_9 
SysAddOut_L_12 
BcData_H_125 
BcData_H_94 
Reset_L 
SysDataInValid_L 
SysAddIn_L_0 
Spare 
BcData_H_92 
SromOE_L 
SysAddIn_L_7 
SysAddOut_L_6 
BcData_H_29 
BcData_H_64 
BcTag_H_40 
BcAdd_H_8 
BcAdd_H_22 
BcData_H_36 
BcData_H_1 
BcData_H_7i 
BcTagOutClk_L 


BcAdd_H_17 


BcData_H_100 
BcData_H_7 
BcTag_H_34 
BcAdd_H_5 
BcAdd_H_23 
BcData_H_37 
SysData_L_2 
BcCheck_H_0 
BcAdd_H_9 
BcDataInClk_H_4 
SysData_L_6 
SysDataOutClk_L_0 
BcTagInClk_H 
BcAdd_H_14 
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BC29 
BC35 
BC43 
BD10 
BD18 
BD24 
BD34 
BD40 
BE11 

BE19 


BE29 ° 


BE39 
BE7 
C15 
C23 
c3 
C37 
C5 
D14 


F40 


G13 
G23 
G33 


SysAddIn_L_1 
SysAddOut_L_3 
BcData_H_60 
BcCheck_H_11 
Tms_H 
SysAddIn_L_12 
SysAddOutClk_L 
BcData_H_127 
CikFwdRst_H 
Tck_H 
SysAddIn_L_3 
SysAddOut_L_1 
BcData_H_31 
BcTag_H_28 
BcTagDirty_H 
BcData_H_4 
BcData_H_33 
BcData_H_3 
BcTag_H_24 
BcTag_H_38 
BcDataWr_L 
BcData_H_97 
BcData_H_101 
BcTag_H_20 
BcTag_H_39 
BcData_H_5 
SysDataInClk_H_4 
BcCheck_H_12 
SysData_L_0 
BcTag_H_37 | 
SysData_L_32 
SysData_L_36 
Spare 
SysData_L_1 
BcTagShared_H 
BcData_H_32 


Compaq Confidential 


BC3 
BC37 
BCS 
BD12 


BD28 
BD36 
BD42 
BE13 
BE23 

BE33 
BE41 


E39 
E7 
F18 
F26 
F34 
F44 
Gl 
G17 
G27 
G35 


BcData_H_91 
BcCheck_H_7 
BcDataInClk_H_3 
PliBypass_H 
NoConnect 
SysAddIn_L_6 
SysAddOut_L_2 
BcData_H_61 
IRQ_H_1 
SysAddIn_L_13 
SysAddOut_L_10 
BcData_H_63 
Spare 
BcTag_H_32 
BcTagOutClk_H 
BcAdd_H_12 
BcData_H_35 
BcData_H_66 
BcTag_H_30 
BcTag_H_41 
BcAdd_H_11 
BcData_H_69 
SysDataInClk_H_0 
BcTag_H_25 
BcTagWr_L 
BcAdd_H_16 
Spare 
BcDataInClk_H_0O 
BeVref 
BcLoad_L 
SysData_L_33 
BcData_H_39 
BcData_H_73 
BcTag_H_27 
BcAdd_H_7 
SysData_L_35 
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Table 3-5 Pin List Sorted by PGA Location (Continued) 
PGA Location Signal Name 





G39 
GS 
H12 
H22 
H30 
H4 
H6 
J43 


SysData_L_37 
BcData_H_70 
SysData_L_3 
BcTag_H_42 
BcAdd_H_21 
BcData_H_72 
BcData_H_6 
BcDataOutClk_H_2 
SysData_L_38 
BcData_H_40 
BcData_H_10 
BcData_H_106 
BcCheck_H_4 
SysData_L_8 
SysData_L_41 
BcData_H_74 
SysData_L_42 
BcDataInClk_H_1 
BcData_H_12 
BcData_H_108 
BcData_H_77 
BcData_H_109 
Spare 
SysData_L_13 
SysData_L_46 
BcData_H_78 
SysData_L_47 


PGA Location Signal Name 


G41 
G7 
H16 
H24 
H34 
H40 
J3 


BcData_H_38 
SysData_L_5 
BcTag_H_21 
BcTagOE_L 
SysData_L_34 
BcData_H_102 
BcData_H_8 
BcDataOutClk_H_0 
BcDataOutCik_L_0 
SysData_L_7 
SysCheck_L_4 
SysCheck_L_0 
SysData_L_40 
BcCheck_H_8 
BcData_H_41 
SysData_L_9 
BcData_H_43 
SysDataOutClk_L_5 
SysData_L_43 
SysData_L_11 
BcDataInClk_H_5 
SysDataOutClIk_L_1 
SysData_L_44 
SysData_L_12 
SysData_L_45 
SysData_L_14 
BcData_H_46 
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~ PGA Location Signal Name 


G45 
H10 


BcData_H_104 
SysData_L_4 
BcTag_H_29 
BcAdd_H_1i3 
SysDataOutCik_L_4 
BcData_H_103 
SysData_L_39 
BcData_H_9 
BcDataOutCik_L_2 
BcData_H_75 
BcData_H_105 
BcData_H_ll 
BcData_H_42 
BcData_H_76 
BcData_H_107 
SysDataInCik_H_1! 
SysData_L_10 
SysDataInClk_H_5 
Spare 
BcData_H_13 
BcData_H_44 
BcData_H_14 
BcData_H_45 
BcData_H_79 
BcData_H_110 
BcData_H_15 
SysData_L_15 
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Mechanical Specifications 


Table 3-6 lists the 21264A ground and power (VSS and VDD, respectively) pin list. 


Table 3-6 Ground and Power (VSS and VDD) Pin List 





Signal PGA Location 

VSS Al5 A21 A25 A3 A31 A37 =A43.—s AD AAI AA39 
AA45 AAT AC41 ACS AE] AE39 AE4S AE7 AG3 AG39 
AG43 AGS AJ41_— AJS5 ALI AL39. AL45 AN3 AN43 ANS 
AR41 AR5 AU] AU39 AU45 AU7 AW21 AW25 AW3 = AW31 
AW37 AW41 AW43  AW7 AW9 = AY14- BAI1 BAI7~ BA23 BA29 
BA35 BA41 BAS  BA7_ BCI BC13. BC1I9 BC27 BC33 BC39 
BC45 BC7 BEIS BE21 BE25_ BE3 BE3!1 BE37 BE4 Cl 
C13 C19 C27 C33 C39 C45 C7 DS8 Ell E17 
E23 E29 E35 E41 E5 E9 G15 G2] G25 G3 


G31 G37 G43 G9 Jl J39 J45 J7 L41 L5 

N3 N43 Rl R39 R45 R5 R7 T42 U41 U5 

W3 W430 — — — — _— = = fz 
VDD A23 AB40 AB6 AD40 AD6 = AF2 AF38 AF44 AF8 AH4 


AH42 AK40 AK6 AM2 AM38 AM44 AP4 AP42 AP6 AT40 
AT6 AVI4 = AV2 AV20 AV26 AV32 AV38 AV44 AYIO AYI6 
AY22. AY24 AY30 AY36 AY4 AY42 B14 B2 B20 B26 
B32 B38 B44 B8 BBI2 BBI8 BB28 BB34 BB40_ BB6 
BD14 BD20 BD26 BD32 BD38 BD44 BD8 D12 D18 D28 
D34 D40 D6 Fi0 F16 F22 F24 F30 F36 F4 
F42 H14 H2 H20 H26 H32 H38 H44 K40 K6 
M4 M42 —s P2 P38 P44 P8 T40 T6 v4 V42 
¥2 Y38 Y44 Y8 — — — — _— — 


3.4 Mechanical Specifications 


This section shows the 21264A mechanical package dimensions without a heat sink. 
For heat sink information and dimensions, refer to Chapter 10. 
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Mechanical Specifications 


Figure 3—2 shows the package physical dimensions without a heat sink. 


Figure 3-2 Package Dimensions 
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21264A Packaging 


3.5 21264A Packaging 


Figure 3-3 shows the 21264A pinout from the top view with pins facing down. 


Figure 3-3 21264A Top View (Pin Down) 
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21264A Packaging 


Figure 3—4 shows the 21264A pinout from the bottom view with pins facing up. 


Figure 3-4 21264A Bottom View (Pin Up) 
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Introduction to the External Interfaces 


A 


Cache and External Interfaces 


This chapter describes the 21264A cache and external interface, which includes the sec- 
ond-level cache (Bcache) interface and the system interface. It also describes locks, 
interrupt signals, and ECC/parity generation. It is organized as follows: 


Introduction to the external interfaces 
Physical address considerations 
Bcache structure 

Victim data buffer 

Cache coherency 

Lock mechanism 

System port 

Bcache port 


Interrupts 


Chapter 3 lists and defines all 21264A hardware interface signal pins. Chapter 9 
describes the 21264A hardware interface electnical requirements. 


4.1 Introduction to the External Interfaces 


A 21264A-based system can be divided into three major sections: 


21264A microprocessor 
Second-level Bcache 
System interface logic 


— Optional duplicate tag store 
— Optional lock register 
— Optional victim buffers 


The 21264A external interface is flexible and mandates few design rules, allowing a 
wide range of prospective systems. The external interface is composed of the Bcache 
interface and the system interface. 


Input clocks must have the same frequency as their corresponding output clock. For 
example, the frequency of SysAddInClk_L must be the same as 
SysAddOutClk_L. 
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Introduction to the External Interfaces 


¢ The Bcache interface includes a 128-bit bidirectional data bus, a 20-bit unidirec- 
tional address bus, and several control signals. 


— The BcDataOutClk_x[3:0] clocks are free-running and are derived from the 
internal GCLK. The period of BeDataOutCIk_x[3:0] is a programmable multi- 
ple of GCLK. 


— The Bcache turns the BcDataOutClk_x[3:0] clocks around and returns them to 
the 21264A as BcDataInClk_H[7:0]. Likewise, BeTagOutClk_x returns as 
BcTagInClk_H. 


— The Bcache interface supports a 64-byte block size. 


¢ The system interface includes a 64-bit bidirectional data bus, two 15-bit 
unidirectional address buses, and several control signals. 


— The SysAddOutClk_L clock is free-running and is derived from the internal 
GCLK. The period of SysAddOutClk_L is a programmable multiple of GCLK. 


— The SysAddInClk_L clock is a turned-around copy of SysAddOutCik_L. 


Figure 4-1 shows a simplified view of the external interface. The function and purpose 
of each signal is described in Chapter 3. 
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Introduction to the External Interfaces 


Figure 4-1 21264A System and Bcache Interfaces 
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4.1.1 System Interface 


This section introduces the system (external) bus interface. The system interface is 
made up of two unidirectional 15-bit address buses, 64 bidirectional data lines, eight 
bidirectional check bits, two single-ended unidirectional clocks, and a few control pins. 
The 15-bit address buses provide time-shared address/command/ID in two or four _ 
GCLK cycles. The Cbox controls the system interface. 
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Physical Address Considerations 


4.1.1.1 Commands and Addresses 


The system sends probe and data movement commands to the 21264A. The 21264A 
can hold up to eight probe commands from the system. The system controls the number 
of outstanding probe commands and must ensure that the 21264A 8-entry probe queue 
does not overflow. 


The Cbox contains an 8-entry miss buffer (MAF) and an 8-entry victim buffer (VAF). 


A miss occurs when the 21264A probes the Bcache but does not find the addressed 
block. The 21264A can queue eight cache misses to the system in its MAF. 


4.1.2 Second-Level Cache (Bcache) Interface 


The 21264A Cbox provides control signals and an interface for a second-level cache, 
the Bcache. The 21264A supports a Bcache from 1MB to 16MB, with 64-byte blocks. 
A 128-bit data bus is used for transfers between the 21264A and the Bcache. The 
Bcache must be comprised of synchronous static RAMs (SSRAMs) and must contain 
either one, two, or three internal registers. All Bcache control and address pins are 
clocked synchronously on Bcache cycle boundaries. The Bcache clock rate varies as a 

‘multiple of the CPU clock cycle in half-cycle increments from 1.5 to 4.0, and in full- 
cycle increments of 5, 6, 7, and 8 times the CPU clock cycle. The 1.5 multiple is only 
available in dual-data mode. 


4.2 Physical Address Considerations 


The 21264A supports a 44-bit physical address space that is divided equally between 
memory space and I/O space. Memory space resides in the lower half of the physical 
address space (PA[43] = 0) and I/O space resides in the upper half of the physical 
address space (PA[43] = 1). The 21264A recognizes these spaces internally. 


The 21264A-generated external references to memory space are always of a fixed 
64-byte size, though the internal access granularity is byte, word, longword, or quad- 
word. All 21264A-generated external references to memory or I/O space are physical 
addresses that are either successfully translated from a virtual address or produced by 
PALcode. Speculative execution may cause a reference to nonexistent memory. Sys- 
tems must check the range of all addresses and report nonexistent addresses to the 
21264A. 


Table 4—1 describes the translation of internal references to external interface refer- 
ences. The first column lists the instructions used by the programmer, including load 
(LDx) and store (STx) instructions of several sizes. The column headings are described 
here: 


¢ DcHit (block was found in the Dcache) 
e DcW (block was found in a writable state in the Deache) 
¢ BcHit (block was found in the Bcache) 
¢ BcW (block was found in a writable state in the Bcache) 


e Status and Action (status at end of instruction and action performed by the 21264A) 
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Physical Address Considerations 


Prefetches (LDL, LDF, LDG, LDT, LDBU, LDWU) to R31 use the LDx flow, and 
prefetch with modify intent (LDS) uses the STx flow. If the prefetch target is addressed 


to I/O space, the upper address bit is cleared, converting the address to memory space 
(PA[42:6] ). Notes follow the table. 


Table 4-1 Translation of Internal References to External Interface Reference 








Instruction DcHit DcW BcHit BcW Status and Action 
LDx Memory 1 Xx Xx ».4 Deache hit, done. 
LDx Memory 0 Xx 1 X Bcache hit, done. 
LDx Memory 0 Xx 0 Xx Miss, generate RdBlk command. 
LDx VO x Xx Xx Xx RdBytes, RdLWs, or RdQWs based on size. 
Istream Memory 1 Xx > 4 Xx Deache hit, Istream serviced from Dcache. 
Istream Memory 0 Xx 1 Xx Bcache hit, Istream serviced from Bcache. 
Istream Memory 0 X 0 Xx Miss, generate RdBIkI command. 
STx Memory ] 1 Xx Xx Store Dcache hit and writable, done. 
STx Memory 1 0 X xX Store hit and not writable, set dirty flow (note 1). 
STx Memory 0 x 1 ] Store Bcache hit and writable, done. 
STx Memory 0 Xx 1 0 Store hit and not writable, set-dirty flow (note 1). 
STx Memory 0 Xx 0 X Miss, generate RdBlkMod command. 
STx VO Xx Xx xX Xx WrBytes, WrLWs, or WrQWs based on size. 
STx_C Memory 0 Xx x Xx Fail STx_C. 
STx_C Memory ] 0 X xX STx_C hit and not writable, set dirty flow (note 1). 
STx_C VO X xX X X Always succeed and WrQws or WrLws are generated, 
based on the size. 
WH64 Memory l 1 x x Hit, done. 
WH64 Memory 1 0 Xx x WH64 hit not writable, set dirty flow (note 1). 
WH64 Memory 0 x ] 1 WH64 hit dirty, done. 
WH64 Memory 0 xX 1 0 WH64 hit not writable, set dirty flow (note 1). 
WH64 Memory 0 xX 0 xX Miss, generate InvalToDirty command (note 2): 
WH64 VO Xx x X xX NOP the instruction.. WH64 is UNDEFINED for /O 
space. 
ECB Memory X X xX X Generate evict command (note 3). 
ECB VO xX Xx Xx X NOP the instruction. ECB instruction is UNDEFINED 
for I/O space. | 
MB/WMB xX X xX xX Generate MB command (note 4). Also see Section 3.2.5. 
TBFill Flows 
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Bcache Structure 


Table 4-1 notes: 


1. Set Dirty Flow: Based on the Cbox CSR SET_DIRTY_ENABLE[2:0], SetDirty 
requests can be either internally acknowledged (called a SetModify) or sent to the 
system environment for processing. When externally acknowledged, the shared sta- 
tus information for the cache block is also broadcast. The commands sent exter- 
nally are SharedToDirty or CleanToDirty. Based on the Cbox CSR 
ENABLE_STC_COMMAND)[0], the external system can be informed of a STx_C 
generating a SetDirty using the STCChangeToDirty command. See Table 4-16 for 
more information. 


2. InvalToDirty: Based on the Cbox CSR INVAL_TO_DIRTY_ENABLE[]1:0], Inval- 
ToDirty requests can be either internally acknowledged or sent to the system envi- 
ronment as InvalToDirty commands. This Cbox CSR provides the ability to convert 
WH64 instructions to RdModx operations. See Table 4-15 for more information. 


3. Evict: There are two aspects to the commands that are generated by an ECB 
instruction: first, those commands that are generated to notify the system of an evict 
being performed; second, those commands that are generated by any victim that is 
created by servicing the ECB. 


If Cbox CSR ENABLE_EVICTY[0] is clear, no command is issued by the 21264A 
on the external interface to notify the system of an evict being performed. If Cbox 
CSR ENABLE_EVICT[0] is set, the 21264A issues an Evict command on the sys- 
tem interface only if a Bcache index match to the ECB address is found in the 
21264A cache system. 


The 21264A can issue the commands CleanVictimBlk and WrVictimBlk for a vic- 
tim that is created by an ECB. CleanVictimBlk is issued only if Cbox CSR 
BC_CLEAN_VICTIM is set and there is a Bcache index match valid but not dirty 
in the 21264A cache system. WrVictimBlk is issued for any Bcache match of the 
ECB address that is dirty in the 21264A cache system. 


4. MB: Based on the Cbox CSR SYSBUS_MB_ENABLE, the MB command can be 
sent to the pins. 


Each of these CSRs is programmed appropriately, based on the cache coherence proto- 
col used by the system environment. For example, uniprocessor systems would prefer 
to internally acknowledge most of these transactions. In contrast, multiprocessor sys- 
tems may require notification and control of any change in cache state. The 21264A and 
the external system must cooperate to maintain cache coherence. Section 4.5 explains 
the 21264A part of the cache coherency protocol. 


4.3 Bcache Structure 


The 21264A Cbox provides control signals and an interface for a second-level cache 
(Bcache). 


The 21264A supports a Bcache from 1MB to 16MB, with 64-byte blocks. A 128-bit 
bidirectional data bus is used for transfers between the 21264A and the Bcache. The 
Bcache is fully synchronous and the synchronous static RAMs (SSRAMs) must contain 
either one, two, or three internal registers. All Bcache control and address pins are 
clocked synchronously on Bcache cycle boundaries. The Bcache clock rate varies as a 
multiple of the CPU clock cycle in half-cycle increments from 1.5 to 4.0, and in full- 
cycle increments of 5, 6, 7, and 8 times the CPU clock cycle. The 1.5 multiple is only 
available in dual-data mode. 
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Victim Data Buffer 


4.3.1 Bcache Interface Signals 


Figure 4—2 shows the 21264A system interface signals. 
Figure 4-2 21264A Bcache Interface Signals 
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4.3.2 System Duplicate Tag Stores 


The 21264A provides Bcache state support for systems with and without duplicate tag 
stores, and will take different actions on this basis. The system sets the Cbox CSR 
DUP_TAG_ENA|[0], indicating that it has a duplicate tag store for the Bcache. Systems 
using the DUP_TAG_ENA [0] bit must also use the Cbox CSR 
BC_CLEAN_VICTIM[0] bit to avoid deadlock situations. 


Systems using a Bcache duplicate tag store can accelerate system performance by: 


¢ Issuing probes and SysDc fill commands to the 21264A out-of-order with respect to 
their order at the system serialization point 


e Filtering out all probe misses from the 21264A cache system 


If a probe misses in the 21264A cache system (Bcache miss and VAF miss), the 
21264A stalls probe processing with the expectation that a SysDc fill will allocate this 
block. Because of this, in duplicate tag mode, the 21264A can never generate a probe 
miss response. 


When Cbox CSR DUP_TAG_ENA[0] equals 0, the 21264A delivers a miss response 
for probes that do not hit in its cache system. 


4.4 Victim Data Buffer 
The 21264A has eight victim data buffers (VDBs). They have the following properties: 
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Cache Coherency 


The VDBs are used for both victims (fills that are replacing dirty cache blocks) and 
for system probes that require data movement. The CleanVictimBIk command 
(optional) assigns and uses a VDB. 


Each VDB has two valid bits that indicate the buffer is valid for a victim or valid 
for a probe or valid for both a victim and a probe. Probe commands that match the 
address of a victim address file (VAF) entry with an asserted probe-valid bit (P) 
will stall the 21264A probe queue. No ProbeResponses will be returned until the P 
bit is clear. 


The release victim buffer (RVB) bit, when asserted, causes the victim valid bit, on 
the victim data buffer (VDB) specified in the ID field, to be cleared. The RVB bit 
will also clear the IOWB when systems move data on I/O write transactions. In this 
case, ID[3] equals one. 


The release probe buffer (RPB) bit, when asserted (with a WriteData or Release- 
Buffer SysDc command), clears the P bit in the victim buffer entry specified in the 
ID field. 


Read data commands and victim write commands use IDs 0-7, while IDs 8-11 are 
used to address the four I/O write buffers. 


4.5 Cache Coherency 


This section describes the basics and protocols of the 21264A cache coherency scheme. 


4.5.1 Cache Coherency Basics 


4-8 


The 21264A systems maintain the cache hierarchy shown in Figure 4—3. 


Figure 4-3 Cache Subset Hierarchy 





Bcache 


FN-05824.Al4 


The following tasks must be performed to maintain cache coherency: 


Cache and External Interfaces 


Istream data from memory spaces may be cached in the Icache and Bcache. Icache 
coherence is not maintained by hardware—it must be maintained by software using 
the CALL_PAL IMB instruction. 


Compaq Confidential . 
21264A Revision 1.1 —- Subject To Change 


Cache Coherency 


¢ The 21264A maintains the Dcache as a subset of the Bcache. The Dcache is set- 
associative but is kept a subset of the larger externally implemented direct-mapped 
Bcache. . 


e System logic must help the 21264A to keep the Bcache coherent with main mem- 
ory and other caches in the system. 


e The 21264A requires the system to allow only one change to a block at a time. This 
means that if the 21264A gains the bus to read or write a block, no other node on 
the bus should be allowed to access that block until the data has been moved. 


¢ The 21264A provides hardware mechanisms to support several cache coherency 
protocols. The protocols can be separated into two classes: write invalidate cache 
coherency protocol and flush cache coherency protocol. 


4.5.2 Cache Block States 
Table 4—2 lists the cache block states supported by the 21264A. 


Table 4-2 21264A-Supported Cache Block States 








State Name Description 
~ Invalid The 21264A does not have a copy of the block. 
Clean _ This 21264A holds a read-only copy of the block, and no other agent in the system holds a 


copy. Upon eviction, the block is not written to memory. 


Clean/Shared This 21264A holds a read-only copy of the block, and at least one other agent in the system 
may hold a copy of the block. Upon eviction, the block is not written to memory. 


Dirty This 21264A holds a read-write copy of the block, and must write it to memory after it is 
evicted from the cache. No other agent in the system holds a copy of the block. 


Dirty/Shared This 21264A holds a read-only copy of the dirty block, which may be shared with another 
agent. The block must be written back to memory when it is evicted. 


4.5.3 Cache Block State Transitions 


Cache block state transitions are reflected by 21264A-generated commands to the sys- 
tem. Cache block state transitions can also be caused by system-generated commands to 
the 21264A (probes). Probes control the next state for the cache block. The next state 
can be based on the previous state of the cache block. Table 4—3 lists the next state for 
the cache block. 


Table 4-3 Cache Block State Transitions 








Next State Action Based on Probe Hit 
No change Do not update cache state. Useful for DMA transactions that sample data but 
do not want to update tag state. 
Clean Independent of previous state, update next state to Clean. 
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Table 4-3 Cache Block State Transitions (Continued) 








Next State Action Based on Probe Hit 

Clean/Shared Independent of previous state, update next state to Clean/Shared. This transac- 
tion is useful for systems that update memory on probe hits. 

TI: Based on the dirty bit, make the block clean or dirty shared. This transaction 


Clean => Clean/Shared 
Dirty = Dirty/Shared 


T3: 

Clean = Clean/Shared 

Dirty => Invalid 

Dirty/Shared = Clean/Shared 


is useful for systems that do not update memory on probe hits. 


If the block is Clean or Dirty/Shared, change to Clean/Shared. If the block is 
Dirty, change to Invalid. This transaction is useful for systems that use the 
Dirty/Shared state as an exclusive state. 


The cache state transitions caused by 21264A-generated commands are under the full 
control of the system environment using the SysDc (system data contro!) commands. 
Table 4—4 lists these commands. 


Table 4-4 System Responses to 21264A Commands 


Response Type. 

SysDc ReadData 

SysDc ReadDataDirty 

SysDc ReadDataShared 
SysDc ReadDataShared/Dirty 
SysDc ReadDataError 

SysDc ChangeToDirtySuccess 
SysDc ChangeToDirtyFail 


21264A Action 

Fill block with the associated data and update tag with clean cache status. 
Fill block with the associated data and update tag with dirty Sache status. 
Fill block with the associated data and update tag with shared cache status. 
Fill block with the associated data and update tag with dirty/shared status. 
Fill block with all-ones reference pattern and update tag with invalid status. 
Unconditionally update block with dirty cache status. 


Do not update cache status and fail any associated STx_C instructions. 


4.5.4 Using SysDc Commands 


Note the following: 


¢ The conventional response for RdBlk commands is SysDc ReadData or ReadD- 


ataShared. 


e The conventional response for a RdBIkMod command is SysDc ReadDataDirty. 


¢ The conventional response for ChangeToDirty commands is 
ChangeToDirtySuccess or ChangeToDirtyFail. 


However, the system environment is not limited to these responses. Table 4-5 shows all 
21264A commands, system responses, and the 21264A reaction. The 21264A com- 
mands are described in the following list: 


e Rdx commands are generated by load or Istream references. 


e RdBlkModx commands are generated by store references. 


© The ChxToDirty command group includes CleanToDirty, SharedToDirty, and STC- 
ChangeToDirty commands, which are generated by store references that hit in the 
21264A cache system. — 
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¢ InvalToDirty commands are generated by WH64 instructions that miss in the 
21264A cache system. 


¢ FetchBlk and FetchBlkSpec are noncached references to memory space that have 
missed in the 21264A cache system. 


¢ Rdiox commands are noncached references to I/O address space. 


¢ Evict and STCChangeToDirty commands are generated by ECB and STx_C 
instructions, respectively. 


Table 4—5 shows the system responses to 21264A commands and 21264A reactions. 


Table 4-5 System Responses to 21264A Commands and 21264A Reactions 





21264A CMD SysDc 


21264A Action 





Rdx 


RdBlkModx 


RdBikModx 


RdBlkModx 


RdBlkModx 


ChxToDirty 
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ReadData 
Read DataShared 


ReadDataShared/Dirty 


ReadDataDirty 
ReadDataError 


ChangeToDirty Success 
Change ToDirtyFail 


ReadData 
ReadDataShared 
ReadDataShared/Dirty 


ReadDataDirty 


Change ToDirtySuccess 
ChangeToDirty Fail 


ReadDataError - 


ReadData 
ReadDataShared 
ReadDataShared/Dirty 


This is a normal fill. The cache block is filled and marked clean or 
shared based on SysDc. 


The cache block is filled and marked dirty/shared. Succeeding store 
commands cannot update the block without external reference. 


The cache block is filled and marked dirty. 


The cache block access was to NXM address space. The 21264A 
delivers an all-ones pattern to any load command and evicts the block 
from the cache (with associated victim processing). The cache block 
is marked invalid. 


Both SysDc responses are illegal for read commands. 


The cache block is filled and marked with a nonwritable status. If the 
store instruction that generated the RdBIkModx command is still 
active (not killed), the 21264A will retry the instruction, generating 
the appropriate ChangeToDirty command. Succeeding store com- 
mands cannot update the block without external reference. 


The 21264A performs a normal fill response, and the cache block 
becomes writable. 


Both SysDc responses are illegal for read/modify commands. 


The cache block command was to NXM address space. The 21264A 
delivers an all-ones pattern to any dependent load command, forces a 
fail action on any pending store commands to this block, and any 
store to this block is not retried. The Cbox evicts the cache block from 
the cache system (with associated victim processing). The cache 
block is marked invalid. 


The original data in the Dcache is replaced with the filled data. The 
block is not writable, so the 21264A will retry the store instruction 
and generate another ChxToDirty class command. To avoid a poten- 
tial livelock situation, the STC_ENABLE CSR bit must be set. Any 
STx_C instruction to this block is forced to fail. In addition, a Shared/ 
Dirty response causes the 21264A to generate a victim for this block 
upon eviction. 
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Table 4—5 System Responses to 21264A Commands and 21264A Reactions (Continued) 








21264A CMD SysDc 21264A Action 

ChxToDirty ReadDataDirty The data in the Dcache is replaced with the filled data. The block is 
writable, so the store instruction that generated the original command 
can update this block. Any STx_C instruction to this block is forced 
to fail. In addition, the 21264A generates a victim for this block upon 
eviction. 

ChxToDirty ReadDataError Impossible situation. The block must be cached to generate a ChxTo- 
Dirty command. Caching the block is not possible because all NXM_ 
fills are filled noncached. 

ChToDirty ChangeToDirtySuccess Normal response. ChangeToDirtySuccess makes the block writable. 
The 21264A retries the store instruction and updates the Dcache. Any 
STx_C instruction associated with this block is allowed to succeed. 

ChxToDirty ChangeToDirtyFail The MAF entry is retired. Any STx_C instruction associated with the 
block is forced to fail. If a STx instruction generated this block, the 
21264A retries and generates either a RdBIkModx (because the refer- 
ence that failed the ChangeToDirty also invalidated the cache by way 
of an invalidating probe) or another ChxToDirty command. 

InvalToDirty §ReadData The block is not writable, so the 21264A will retry the WH64 instruc- 

ws ReadDataShared tion and generate a ChxToDirty command. 
ReadDataShared/Dirty 

InvalToDirty |§ ReadDataError The 21264A doesn’t send InvalToDirty commands offchip specula- 

tively. This NXM condition is a hard error. Systems should perform a 
machine check. 

InvalToDirty ReadDataDirty The block is writable. Done. 

ChangeToDirtySuccess 
InvalToDirty | ChangeToDirtyFail Iegal. InvalToDirty instructions must provide a cache block. 
Fetchx ReadData The 21264A delivers the data block, independent of its 
" Rdiox ReadDataShared Status, to waiting load instructions and does not cache the block in the 
ReadDataShared/Dirty 21264A cache system. 
ReadDataDirty 

Fetchx ReadDataError The cache block address was to an NXM address space. The 21264A 
delivers the all-ones patterns to any dependent load instructions and 
does not cache the block in the 21264A cache system. 

Rdiox ReadDataError The cache block access was to NXM address space. The 21264A 
delivers an all-ones pattern to any load command and does not cache 
the block in the 21264A cache system. 

Evict ChangeToDirtyFail Retiring the MAF entry is the only legal response. 

STCChangeTo ReadDataX All fill and ChangeToDirtyFail responses will fail the STx_C require- 

Dirty ChangeToDirtyFail ments. 

STCChangeTo ChangeToDirtySuccess The STx_C instruction succeeds. 

Dirty 

MB MBDone Acknowledgment for MB. 


The 21264A sends a WrVictimBlk command to the system when it evicts a Dirty or 
Dirty/Shared cache block. The 21264A may be configured to send a CleanVictimBIk to 
the system (by way of the Cbox CSR BC_CLEAN_VICTIM[0]) when evicting a clean 
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or shared block. Both commands allocate buffers in the VAF (victim address file). This 
buffer is a coherent part of the 21264A cache system. Write data control and dealloca- 
tion of the VAF can be directly controlled by using the SysDc WriteData and Release- 
Buffer commands. 


4.5.5 Dcache States and Duplicate Tags 


Each Deache block contains an extra state bit (modified bit), beyond those required to 
support the cache protocol. If set, this bit indicates that the associated block should be 
written to the Bcache when it is evicted from the Dcache. The modified bit is set in two 
cases: 


1. When a block is filled into the Dcache from memory its modified bit is set, ensur- 
ing that it also gets written back into the Bcache at some future time. 


2. When the processor writes to a dirty Dcache block the modified bit is set, indicating 
it should be written to the Bcache when evicted. 


The contents of the modified bit are functionally invisible to the external cache environ- 
ment, but knowledge of the bits function is useful to programmers optimizing the 
scheduling of the Bcache data bus. 


The Cbox contains a duplicate copy of the Dcache tag array. In contrast to the Dcache 
tag array (DTAG), which is virtually indexed, the Cbox copy of the Dcache tag array 
(CTAG) is physically-indexed. The Cbox uses the CTAG array entries in the following 
situations. 


1. When the Mbox requests a Deache fill, the Cbox uses the CTAG array entry to find 
if the Dcache already contains the requested physical address in another virtually- 
indexed Dcache line. If it does, the Cbox invalidates that cache line after first writ- 
ing the data back to the Bcache if it was in the modified state. The Cbox also checks 
to see if the Dcache contains an address different from the requested address, but 
maps to the same Bcache line. If it does, the Dcache line is evicted in order to keep 
the Dcache a subset of the Bcache. 


i) 


When the Ibox requests an Icache fill, the Cbox uses the CTAG array entries to find 
if the Dcache contains the requested physical address in the modified state. If it 
does, the Cbox forces the line to be written back to the Bcache before servicing the 
Icache fill request. The Cbox also checks to see if the Dcache contains an address 
different from the requested address but which maps to the same Beache line. In 
this case the Istream request will miss the Bcache, and the Cbox will 

service the request by launching a noncached Fetch command to the system port 
and will not put the Istream block into the Bcache. This mechanism allows the 
21264A to use a cache resident lock flag for LDx_L/STx_C instructions. 


3. The Cbox uses the CTAG array entries to find whether probe addresses are held in 
the Dcache without interrupting load/store instruction processing in the processor 
core. 


4.6 Lock Mechanism 


The 21264A does not contain a dedicated lock register, nor are system components 
required to do so. 
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When a load-lock (LDx_L) instruction executes, data is accessed from the Dcache or 
Bcache. If there is a cache miss, data is accessed from memory with a RdBIk command. 
Its associated cache line is filled into the Dcache in the clean state, if it is not already 
there. 


When the store-conditional (STx_C) instruction executes, it is allowed to succeed if its 
associated cache line is still present in the Dcache and can be made writable; otherwise, 
it fails. 


This algorithm is successful because another agent in the system writing to the cache 
line between the load-lock and the store-conditional cache line would make the cache 
line invalid. This mechanism’s coherence is based on the following four items: 


1. LDx_L instructions are processed in-order in relation to the associated STx_C. 


2. Once a block is locked by way of an LDx_L instruction, no internal agent can evict 
the block from the Dcache as a side-effect of its processing. 


3. Any external agent that intends to update the contents of the stored block must use 
an invalidating probe command to inform the 21264A. 


4. The system is the only agent with sufficient information to manage the tasks of fair- 
ness and liveness. However, to enable these tasks, the 21264A only generates exter- 
nal commands for nonspeculative STx_C instructions, and once given a success 
indication from the system, must faithfully update the Dcache with the STx_C 
value. 


The system is entirely responsible for item number three. The 21264A plays an active 
role in items one, two, and four. 


4.6.1 In-Order Processing of LDx_L/STx_C Instructions 


The 21264A uses the stWait logic in the IQ to ensure that LDx_L/STx_C pairs are 
issued in order. The stWait logic treats an Ldx_L instruction like Stx instructions. 
STx_C instructions are always loaded into the IQ with their associate stWait bit set. 
Thus, a STx_C instruction is not issued until the older LDx_L is out of the IQ. 


4.6.2 Internal Eviction of LDx_L Blocks 


The 21264A prevents the eviction of cache blocks in the Dcache due to either of the fol- 
lowing references: 


e Istream references with a Bcache index that matches the Dcache block and a 
Bcache tag that mismatches the Dcache block. 


To avoid evictions of LDx_L blocks, Istream references that match the index of a 
block in the Dceache are converted to noncached references. 


e Ldx or Stx references with a Dcache index that matches the block. 


In the Alpha Architecture, Dstream references between a LDx_L/STx_C pair force 
the value of the STx_C success flag to bob UNPREDICTABLE. The 21264A forces 
all STx_C instructions that interrupt an LDx_L/STx_C pair to fail in program order. 


There should be no Dstream references between LDx_L/STx_C pairs; however, the 
out-of-order nature of the 21264A can introduce Dstream references between 
LDx_L/STx_C pairs. To prevent load or store instructions older than the LDx_L 
from evicting the LDx_L cache block, the Mbox invokes a replay trap on the 
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incoming load or store instruction, which also aborts the LDx_L. These instructions 
are issued in program order in the next iteration of the trap retry down the pipeline. 
To prevent newer load or store instructions from evicting the locked cache line, the 
Ibox ensures that a STx_C is issued before any newer load or store instruction by 
placing the STx_C into the IQ and stalling all subsequent instructions in the map 
stage of the pipe until the IQ is empty. 


Branch instructions between the LDx_L/STx_C pair may be mispredicted, intro- 
ducing load and store instructions that evict the locked cache block. To prevent that 
from happening, there is a bit in the instruction fetcher that is set for a LDx_L refer- 
ence and cleared on any other memory reference. When this bit is set, the branch 
predictor predicts all branches to fall through. 


4.6.3 Liveness and Fairness 


To prevent a livelock condition, the 21264A processes the STx_C as follows: 


1. IfaSTx_C misses the Dcache, then no system port transaction is started and the 
STx_C fails. 


2. IfaSTx_C nits a Sisck that is not dirty, then a ChangeToDirty (Shared or Clean) is 
launched after the STx_C retires and all older store queue entries are in the writable 
State. This ensures that once the ChangeToDirty command is launched on behalf of 
the STx_C, the STx_C will be executed to completion if the ChangeToDirty com- 
mand succeeds. 


If the ChangeToDirty command succeeds, the STx_C enters the writable state, and the 
Mbox locks the Dcache line. The Mbox does not release the Dcache line until the 
STx_C data is transferred to the Dcache. This ensures that no other agent, by way of a 
probe, can take the block before the STx_C can update the locked block. 


4.6.4 Managing Speculative Store Issues with Multiprocessor Systems 


The 21264A provides two mechanisms to manage an inherent potential side effect of 
speculative execution with multiprocessor systems — a livelock condition caused by a 
speculative store that misses in one processor affecting the execution of a LDx_L/ 
STx_C pair in another processor. The potential livelock condition in multiprocessor 
systems can be effectively controlled by placing processors in a conservative mode, 
where speculative store MAFs are blocked. The 21264A manages conservative mode 
with the Mbox IPR, M_CTL[SMC], described in Table 5-19. 


¢ M_CTL[SMC] can be set to place the 21264A in full-time conservative mode. 


¢ M_CTL[SMC] can be set to place the 21264A in periodic conservative mode, 
timed by two counters: an 8-bit primary counter that tracks branch mispredicts and 
conditional branch retires, and a backup counter that places the 21264A in conser- 
vative mode for a period of 16K cycles every 2 million cycles. 


The 8-bit counter is enabled by placing M_CTL[SMC] in periodic conservative 
mode. The backup counter takes effect whenever the 8-bit counter is enabled. Fur- 
ther, the backup counter can be reset to 0 by clearing a previously set 
M_CTL[SMC], allowing synchronization between processors. 
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4.7 System Port 


The system port is the 21264A’s connection to either a memory or I/O controller or to a 
shared multiprocessor system controller. System port interface signals are shown in 
Figure 44. 


The system port supports transactions between the 21264A and the system. Systems 
must receive and drive signals that are asserted low. Transaction commands are com- 
municated on signal lines SysAddOut_L[14:0] (21264A-to-system) and 
SysAddIn_L[14:0] (system-to-21264A). Transaction data is transferred on a bidirec- 
tional data bus over pins SysData_L[63:0] with ECC on pins SysCheck_L[7:0]. 


Figure 4—4 System Interface Signals 


21264A 


SysAddin_L[14:0] 
SysAddinClk_L 
SysAddOut_L[14:0] 
SysAddOutClk_L 


SysVref 


SysData_L[63:0] 
SysCheck_L[7:0] 
SysDatainCik_H[7:0] 
SysDataOutCik_L[7:0] 
SysDatalnValid_L 
SysDataOutValid_L 
SysFiliValid_L 


IRQ_H[5:0] 


4.7.1 System Port Pins 


FM-05652-EV67 


Table 3-1 defines the 21264A signal types referred to in this section. Table 4—6 lists the 
system port pin groups along with their type, number, and functional description. 


Table 4-6 System Port Pins 


Pin Name 


IRQ_H[5:0] 


SysAddIn_L[14:0] 
SysAddInCtk_L 


SysAddOut_L[14:0] 
SysAddOutClik_L 
SysVref 
SysCheck_L[7:0] 
SysData_L[63:0] 


4-16 Cache and External Interfaces 


Type 
IDA 


IDA 
I_DA 


O_OD 
O_OD 
]_DC_REF 
B_DA_OD 
B_DA_OD 


Count Description 


6 These six interrupt signal lines may be asserted by the sys- 
tem. 
15 Time-multiplexed SysAddIn, system-to-21264A. 


1 Single-ended forwarded clock from system for 
SysAddIn_L[14:0] and SysFillValid_L. 


15 Time-multiplexed SysAddOut, 21264A-to-system. 
1 Single-ended forwarded clock. 

] System interface reference voltage. 

8 Quadword ECC check bits for SysData_L[63:0]. 
64 Data bus for memory and I/O data. 
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Table 4-6 System Port Pins (Continued) 








Pin Name Type Count Description 

SysDataInCik_H[7:0] LDA 8 Single-ended system-generated clocks for clock forwarded 
input system data. 

SysDataInValid_L LDA 1 When asserted, marks a valid data cycle for data transfers to 
the 21264A. 

SysDataOutClIk_L[{7:0] O_OD 8 Single-ended 21264A-generated clocks for clock forwarded 
output system data. 

SysDataOutValid_L IDA ] When asserted, marks a valid data cycle for data transfers 
from the 21264A. 

SysFillValid_L IDA 1 Validation for fill given in previous SysDc command. 


4.7.2 Programming the System Interface Clocks 


The system forwarded clocks are free running and derived from the 21264A GCLK. 
The period of the system forwarded clocks is controlled by three Cbox CSRs, based on 
the bit-rate ratio (similar to the Bcache bit-rate ratio) except that all transfers are dual- 
data. ; 


~@ = =6SYS_CLK_LD_VECTOR[15:0] 


e SYS_BPHASE_LD_VECTOR{3:0] 
e SYS_FDBK_EN[7:0] 


Table 4—7 lists the programming values used to program the system interface clocks. 


Table 4-7 Programming Values for System Interface Clocks 


System Transfer SYS_CLK_LD_VECTOR'’ SYS_BPHASE_LD_VECTOR'’ SYS_FDBK_EN' 


1.5X-DD 
2.0X-DD 
2.5X-DD 
3.0X-DD 
3.5X-DD 
4.0X-DD 
5.0X-DD 
6.0X-DD 
7.0X-DD 
8.0X-DD 


] 


9249 5 02 
3333 0 01 
8C63 5 02 
71C7 0 10 
C387 A 04 
OFOF 0 01 
7TCIF 0 40 
FO3F 0 10 
CO7F 0 04 
OOFF 0 01 


These are hexadecimal values. 
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In addition to programming of the clock CSRs, the data-sample/drive Cbox CSRs at the 
pads have to be set appropriately. Table 4-8 shows the programmed values for these 
system CSRs. In Table 4-8, each system forwarded clock is the inversion of the low- 
assertion signal at the corresponding pin. 


Table 4-8 Program Values for Data-Sample/Drive CSRs 


CBOX CSR Description 

SYS_DDM_FALL_EN[0] Enables the update of 21264A system outputs based on the falling edge of the 
system forwarded clock. (Always asserted) 

SYS_DDM_RISE_EN[0] Enables the update of 21264A system outputs based on the rising edge of the 


system forwarded clock. (Always asserted) 


SYS_DDM_RD_FALL_EN[0] — Enables the sampling of incoming data on the falling edge of the incoming 


forwarded clock. (Always asserted) 


SYS_DDM_RD_RISE_EN[0] Enables the sampling of incoming data on the rising edge of the incoming 


forwarded clock. (Always asserted) 


SYS_DDMF_ENABLE Enables the falling edge of the system forwarded clock. (Always asserted) 
SYS_DDMR_ENABLE Enables the rising edge of the system forwarded clock. (Always asserted) 


Table 4-9 lists the program values for CSR SYS_FRAME_LD_VECTOR[4:0] that set 
the ratio between the forwarded clocks and the frame clock. 


Table 4-9 Forwarded Clocks and Frame Clock Ratio 


Clock Ratio Transfer Mode ; Value’ 
1:] All - 00 
2:1 3.0X, 3.5X, 8.0X IE 
2:1 1.5X, 2.0X, 2.5X 4.0X, 5.0X, 6.0X 7.0X IF 
4:1 8X 15 
4:] 1.5X, 4.0X, 5.0X, 6.0X, 7.0X 0B 
4:1] 3.0X, 3.5X 14 
4:1 2.0X, 2.5X 0A 


1 These are hexadecimal values. 


4.7.3 21264A-to-System Commands 


This section describes the 21264A-to-system commands format and operation. The 
command, address, ID, and mask bits are transmitted in four consecutive cycles on 
SysAddOut_L[14:0]. The 21264A sends the command information in one of the two — 
following modes as selected by the Cbox CSR bit. 


¢ Bank interleave on cache block boundary mode—S YSBUS_FORMAT[0] = 0 
¢ Page hit mode—SYSBUS_FORMAT[O] = 1 


The physical address (PA) bits arrangements for the two modes is shown in Tables 4-10 
and 4-11. The purpose of the two modes is to give the system the PA bits that allow it to. 
select the memory bank and drive the RAS address as soon as possible. 
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4.7.3.1 Bank Interleave on Cache Block Boundary Mode 


Table 4—10 shows the command format for the bank interleave on cache block bound- 
ary mode of operation (21264A-to-system). 


Table 4-10 Bank Interleave on Cache Block Boundary Mode of re 


Command[4:0] PA[34:28] _| PA[36] PA[38] 


Cycle 2 PA[27:22}, PA[12:6] PA[35] PA(37] 


Cycle3 |M2 —_ |Mask(7:0] [cH | 1D[2:0) PA[40] PA[42] 
PATS, PAS) [PA 


4.7.3.2 Page Mode Hit 
Table 4-11 shows the command format for page hit mode (21264A-to-system). 








Table 4—11 Page Hit Mode of Operation 


| smeon ia Tomei 


Table 4—12 describes the field definitions for Tables 4-10 and 4-11. 















Table 4-12 21264A-to-System Command Fields Definitions (Sheet 1 of 2) 
SysAddOut Field Definition 





Ml When set, reports a miss to the system for the oldest probe. 
When clear, has no meaning. 
Command[4:0] The 5-bit command field is defined in Table 4-14. 


SysAddOut[1!:0] This field is needed for systems with greater than 32GB of memory, up to a maximum of 8 
Terabyte (8TB). Cost-focused systems can tie these bits high and use a 13-bit command/ 
address field. 


M2 When set, reports that the oldest probe has missed in cache. Also, this bit is set for system- 
to-21264A probe commands that hit but have no data movement (see the CH bit, below). 
When clear, has no meaning. 
M1 and M2 are not asserted simultaneously. Reporting probe results as soon as possible is 
critical to high-speed operation, so when a result is known the 21264A uses the earliest 
opportunity to send an M signal to the system. M bit assertion can occur either in a valid 
command or a NZNOP. 


ID[2:0] "The ID number for the MAF, VDB, or WIOB associated with the command. 


RV If set, validates this command. 
In speculative read mode (optional), RV = = 1 validates the command and RV = 0 indicates 
a NOP. 
For all nonspeculative commands RV = 1. 


Mask[7:0] The byte, LW, or QW mask field for the corresponding /O commands. 


Compaq Confidential 
21264A Revision 1.1 - Subject To Change Cache and External Interfaces 4-19 


System Port 


Table 4-12 21264A-to-System Command Fields Definitions (Continued) (Sheet 2 of 2) 


SysAddOut Field Definition 


CH 


The cache hit bit is asserted, along with M2, when probes with no data movement hit in 
the Dcache or Bcache. This response can be generated by a probe that explicitly indicates 
no data movement or a ReadIfDirty command that hits on a valid but clean or shared 


block. 


System designers can minimize pin count for systems with a small memory by config- 
uring both the bank interleave on cache block boundary mode and the page hit mode 
formats into a short bus format. The pin SysAddOut_L[1] and/or SysAddOut_L[0] 
are not used (selected by Cbox CSR SYS_BUS_SIZEf[1:0]). Table 4—13 lists the values 
for SYSBUS_FORMAT and SYS_BUS_SIZE[1:0] and shows the maximum physical 


memory size. 


Table 4-13 Maximum Physical Address for Short Bus Format 


SYSBUS_ 
FORMAT 


0 


0 
0 
0 


— 


SYSBUS_ 

SIZE[1:0] 

00 42 

01 36 
10 Illegal 
1i 34 

00 38 

01 36 

10 Illegal 
11 34 


Maximum PA Comment 


Bank interleave + full address 

Bank interleave + SysAddOut_L[0] unused 

Illegal combination 

Bank interleave + both SysAddOut_L[1:0] are ‘ised for VO 
Page mode hit + full address | 
Page mode hit + SysAddOut_L[0] unused 

Illegal combination 


Page mode hit + both SysAddOut_L[1:0] are unused 





Because addresses above the maximum PA are not visible to the external system, any 
memory transaction generated to addresses above the maximum PA are detected and 
converted to transactions to NXM (nonexistent memory) and processed internally by 


the 21264A. 


4.7.4 21264A-to-System Commands Descriptions 


Table 4—14 describes the 21264A-to-system commands. 


Table 4-14 21264A-to-System Commands Descriptions 








Command 

Command [4:0] Function 

NOP 00000 The 21264A drives this command on idle cycles during reset. After the 
clock forward reset period, the first NZNOP is generated and this 
command is no longer generated. 

ProbeResponse 00001 Returns probe status and ID number of the VDB entry holding the 
requested cache block. 

NZNOP 00010 This nonzero NOP helps to parse the command packet. 


4-20 Cache and External Interfaces 


Compag Confidential 
21264A Revision 1.1 — Subject To Change 


System Port 


Table 4-14 21264A-to-System Commands Descriptions (Continued) 








Command 

Command [4:0] Function 

VDBFlushRequest 00011 VDB flush request. The 21264A sends this command to the system when 
an internally generated transaction Bcache index matches a Bcache victim 
or probe in the VDB. The system should flush VDB entries associated 
with all probe and WrVictimBlk transactions that occurred before this 
command. 

MB! 00111 Indicates an MB was issued, optional when Cbox CSR 
SYSBUS_MB_ENAJ0}] is set. 

ReadBlk 10000 Memory read. 

ReadBIkMod 10001 Memory read with modify intent. 

ReadBikI 10010 Memory read for Istream. 

FetchBlk 10011 Noncached memory read. 

ReadBlkSpec” 10100 Speculative memory read (optional). 

ReadBIkModSpec? 10101 Speculative memory read with modify intent (optional). 

ReadBlkSpec!” 10110 Memory read for Istream (optional). 

FetchBIkSpec” 10111 Speculative memory noncached ReadBlk (optional). 

ReadBlkVic? 11000 Memory read with a victim (optional). 

ReadBIkModVic* 11001 Memory read with modify intent, with a victim (optional). 

ReadBlk Vic]? 11010 Memory read for Istream with a victim (optional). 

WrVictimBlk | 00100 Write-back of dirty block. 

CleanVictimBlk 00101 Address of a clean victim (optional). 

Evict* 00110 Invalidate evicted block at the given Bcache index (optional). 

ReadBytes ~~ 01000 V/O read, byte mask. 

ReadLWs 01001 I/O read, longword mask. 

ReadQWs 01010 I/O read, quadword mask. 

WrBytes 01100 V/O write, byte mask. 

WrLWs 01101 YO write, longword mask. 

WrQWs 01110 V/O write, quadword mask. 

CleanToDirty® 11100 Sets a block dirty that was previously clean (optional for duplicate tags). 

SharedToDirty® 11101 Sets a block dirty that was previously shared (optional for multiprocessor 
systems). 

STCChangeToDirty® 11110 Sets a block dirty that was previously clean or shared fora STx_C - 
instruction (optional for multiprocessor systems). 

InvalToDirtyVic?> 11011 Invalid to dirty with a victim (optional). 

InvalToDirty> 11111 WH64 Acts like a ReadBIkMod without the fill cycles (optional). 


Table 4—14 footnotes: 
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Systems can optionally enable MB instructions to the external system by asserting 
Cbox CSR SYSBUS_MB_ENABLE. This mode is described in Section 2.12.1. 


To minimize load-to-use memory latency, systems can optionally enable specula- 
tive transactions to memory space by asserting the Cbox CSR 
SPEC_READ_ENABLE[0]. If the Cbox system command queue is empty, a 
bypass between the Bcache interface and the system interface is enabled (in combi- 
nation with this mode). When the next new transaction is delivered by the Mbox, 
the Cbox starts MAF memory references to the system interface before the results 
of Beache hit is known. The RV bit is deasserted on a Beache hit, or in 
BC_RDVICTIM[0] mode (see footnote 3, below), and for Bcache miss transactions 
that generate a victim (clean or dirty). Otherwise, the RV bit is asserted. 


Systems can optionally enable RdBlkVic, RdBIkModVic, and InvalToDirty Vic 
commands using Cbox CSR BC_RDVICTIM[0]. In this mode of operation 
RdBlkxVic command cycles are always followed immediately by the WrVictimBIk 
commands. Also, when Clean VictimBlk commands are enabled, they 

immediately follow RdBikVic, RdBlkModVic, and InvalToDirty Vic commands. 


Systems can optionally enable Evict commands by asserting the Cbox CSR 
ENABLE_EVICT. In this mode, all ECB instructions will generate an Evict com- 
mand, and in combination with BC_RDVICTIM[0] mode, the WriteVictim or 
CleanVictim (when Cbox CSR BC_CLEAN_VICTIM{(0] is asserted) is associated 
with the Evict command is atomically sent after the Evict command. — 


Optionally, systems can enable InvalToDirty commands by programming Cbox 
CSR INVAL_TO_DIRTY_ENABLE[1:0]. Table 4-15 shows how to program 
INVAL_TO_DIRTY_ENABLE[1:0]. 


Table 4-15 Programming INVAL_TO_DIRTY_ENABLE[1:0] 
INVAL_TO_DIRTY_ENABLE[1:0] Cbox Action 


X0 


1] 


WH64 instructions are converted to RdModx commands at the interface. 
Beyond this point, no other agent sees the WH64 instruction. This mode is 
useful for microprocessors that do not want to support InvalToDirty transac- 
tions. 


WH64 instructions are enabled, but they are acknowledged within the 
21264A. 


WH64 instructions are enabled, and generate InvalToDirty transactions at 
the 21264A pins. 


Optionally, systems can enable CleanToDirty or SharedToDirty commands by 
using Cbox CSR SET_DIRTY_ENABLE[2:0]. These three bits control] the Cbox 
action upon a block that was hit in the Dcache with a status of dirty/shared, clea 
shared, or clean respectively. 


Table 4-16 Programming SET_DIRTY_ENABLE[2:0] 
SET_DIRTY_ENABLE 


[2,0] (DS,CS,C) 
000 
00] 
010 


Cbox Action 
Everything acknowledged internally (uniprocessor). 
Only clean blocks generate external acknowledge (CleanToDirty commands only). 


Only clean/shared blocks generate external acknowledge (SharedToDirty command 
only). 
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Table 4-16 Programming SET_DIRTY_ENABLE[2:0] (Continued) 
SET_DIRTY_ENABLE 





[2,0] (DS,CS,C) Cbox Action 
011 Clean and clean/shared blocks generate external acknowledge. 
100 Only dirty/shared blocks generate external acknowledge (SharedToDirty commands 
only). 
101 Only dirty/shared and clean blocks generate external acknowledge. 
110 Only dirty/shared and clean/shared blocks generate external acknowledge. 
111 All transactions generate external acknowledge. 


Systems that require an explicit indication of ChangeToDirty status changes initi- 
ated by STx_C instructions can assert Cbox CSR STC_ENABLE[0]. When this 
register field = 000, CleanToDirty and SharedToDirty commands are used. The dis- 
tinction between a ChangeToDirty command generated by a STx_C instruction and 
one generated by a STx instruction is important to systems that want to service 
ChangeToDirty commands with dirty data from a source processor. In this case, the 
distinction between a locked exclusive instruction and a normal instruction is criti- 
cal to avoid livelock for a LDx_L/STx_C sequence. 


4.7.5 ProbeResponse Commands (Command[4:0] = 00001) 


The 21264A responds to system probes that did not miss with a 4-cycle transfer on 
SysAddOut_L[14:0]. As shown in Table 4-14, the Command[4:0] field for a ProbeRe- 
sponse command equals 00001. Table 4-17 shows the format of the 21264A ProbeRe- 
sponse command. 


ena 4-17 21264A ProbeResponse Command 












SysAddOut_L[14:2] | SysAddOut_L[1] L[1] | SysAddOut_L[0] 


a 00001 | Status[1:0) DM {VS ;VDB 
[2:0] 
Cycle 2 MS |MAF {|X 
{2:0] 
Sues pe 


Table 4-18 describes the ProbeResponse command fields. 








Table 4-18 ProbeResponse Fields Descriptions (Sheet 1 of 2) 
ProbeResponse Field Description 

Command[4:0]  __ The value 00001 identifies the command as a ProbeResponse. 

DM Indicates that data movement should occur (copy of probe valid bit). See Section 4.4. 
vs Write victim sent bit. | 

VDB([2:0] ID number of the VDB entry containing the requested cache block. This field is valid 


when either the DM bit or the VS bit equals 1. 
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Table 4-18 ProbeResponse Fields Descriptions (Sheet 2 of 2) 


ProbeResponse Field Description 


MS 
MAF[2:0] 


Status[1:0] 


MAF address sent. 


This field indicates the SharedToDirty, CleanToDirty, or 
STCChangetoDirty MAF entry that matched the full probe address. 


Result of probe: 
Status[1:0] Probe state 
00 HitClean 
01 HitShared 
10 HitDirty 
1] HitSharedDirty 


The system uses the SysDc signal lines to retrieve data for probes that requested a cache 
block from the 21264A. See Section 4.7.7.2 for more information about 2-cycle data 
transfer commands. Probes that respond with M1, M2, or CH=1 will not be reported to 
the system in a probe response command. 


4.7.6 SysAck and 21264A-to-System Commands Flow Control 


Controlling the flow of 21264A-to-system commands is a joint task of the 21264A and 
the system. The flow is controlled using the A bit, which is asserted by the system, and 
the Cbox CSR SYSBUS_ACK_LIMIT{[4:0] counter. The counter has the following 
properties: 


The 21264A increments its command-outstanding counter when it sends a com- 
mand to the system. The 21264A decrements the counter by one each time the A bit 
(SysAddIn_L[14]) is asserted in a system-to-21264A command. The A bit is trans- 
mitted during cycle four of a probe mode command or during cycle two of aSysDc 
command. 


The 21264A stops sending new commands when the counter hits the maximum 
count specified by Cbox CSR SYSBUS_ACK_LIMIT[4:0]. When this counter is 
programmed to zero, the CMD_ACK count is ignored (unlimited commands are 
allowed in-flight). 


Because RdBlkxVic and WrVictimBIk commands are atomic when the CSR 
BC_RDVICTIM[0] is set, the 21264A does not send a RdBlkxVic command if the 
SYSBUS_ACK_LIMIT[4:0] is equal to one less than the maximum outstanding 
count. The limit cannot be programmed with a value of one when RdBlkxVic com- 
mands are enabled unless the Cbox CSR RDVIC_ACK_INHIBIT command is also 
asserted (see Table 5-24). 


There is no mechanism for the system to reject a 21264A-to-system command. 
ProbeResponse, VDBFlushReq, NOP, NZNOP, and RdBlkxSpec (with a clear RV 


_ bit) commands do not require a response from the system. Systems must provide 


adequate resources for responses to all probes sent to the 21264A. 


Systems that program the Cbox CSR BC_RDVICTIM[0] to immediately follow 
victim write transactions with read transactions and allocate combined resources 
for the pair, may find it useful to increment the SYSBUS_ACK_LIMIT[4:0] 
counter only once for the pair. These systems may assert Cbox CSR 
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RDVIC_ACK_INHIBIT, which does not increment the 
SYSBUS_ACK_LIMIT[4:0] count for RdBikVic, RdBIkModVic, and RdBlkVicI 
commands. 


e¢ Systems that maintain victim data buffers may find it useful to limit the number of 
outstanding WrVictimBlk commands. This can be accomplished by using the Cbox 
CSR SYSBUS_VIC_LIMIT[2:0]. When the number of outstanding WrVictim 
commands or CleanVictim commands reaches this programmed limit, the Cbox 
stops generating victim commands on the system port. Because victim and read 
commands are atomic when BC_RDVICTIM[0] = 1, the RdBlkxVic commands are 
stalled when the victim limit 1s reached. Programming the | 
SYSBUS_VIC_LIMIT{2:0] to zero disables this limit. 


4.7.7 System-to-21264A Commands 


The system can send either probes (4-cycle) or data movement (2-cycle) commands to 
the 21264A. Signal pin SysAddIn_L[14] in the first command cycle indicates the type 
of command being sent (1 = probe, 0 = data transfer). Sections 4.7.7.1 and 4.7.7.2 
describe the formats of the two types of commands. 


4.7.7.1 Probe Commands (Four Cycles) 
Probes are always 4-cycle commands that contain a field to indicate a valid SysDc com- 


mand. The format of the 4-cycle command is shown below. 


Note: The SysAddIn_L[1:0] signal lines are optional and are used for memory 
designs greater than 32GB. The position of the address bits matches the 
selected format of the SysAddOut bus. The example below shows the bank 
interleave format. 


Table 4-19 shows the format of the system-to-21264A probe commands. 


Table 4-19 System-to-21264A Probe Commands 


Se Sr 
jeaerfy | nowae —[—eatvan oan fos 
es Jo[Sabaeo. [ave [wre [A [ooo [aio Joa 


Table 4-20 describes the system-to-21264A probe commands fields descriptions. 










Table 4-20 System-to-21264A Probe Commands Fields Descriptions (Sheet 1 of 2) 
SysAddin_L[14:0] | 

Field . Description 

Probe[4:0] Probe type and next tag state (see Tables 4-21 and 4-22). 

SysDc[4:0] Controls data movement in and out of the 21264A. See Table 4-24 for a list of data 

movement types. 
RVB Clears the victim or I/O write buffer IOWB) valid bit specified in ID[3:0). 
RPB Clears probe valid bit specified in ID[2:0]. 
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Table 4—20 System-to-21264A Probe Commands Fields Descriptions (Sheet 2 of 2) 

SysAddin_L[14:0] 

Field Description 

A Command acknowledge. When set, the 21264A decrements its command outstanding 
counter (SYSBUS_ACK_LIMIT{[4:0]). 

ID[3:0] Identifies the victim data buffer (VDB) number or the I/O write buffer IOWB) number. 
Bit [3] is only asserted for the IOWB. 

Cc Commit bit. This bit decrements the uncommitted event counter (MB_CNTR) used for 
MB acknowledge. 


The probe command field Probe [4:0] has two sections, Probe[4:3] and Probe [2:0]. 
Table 4-21 lists the data movement selected by Probe[4:3]. 


Table 4-21 Data Movement Selection by Probe [4:3] 








Probe[4:3] Data Movement Function 

00 NOP 

01 Read if hit, supply data to system if block is valid. 

10 Read if dirty, supply data to system if block is valid/dirty. 
1] Read anyway, supply data to the system at index of probe. 


Table 4—22 lists the next cache block state selected by Probe [2:0]. 
Table 4-22 Next Cache Block State Selection by Probe [2:0] 








Probe[2:0] Next Tag State 

000 - NOP 

001 Clean 

010 Clean/Shared 

011 Transition3': Clean = Clean/Shared 


Dirty => Invalid 
Dirty/Shared = Clean/Shared 


100 Dirty/Shared 
101 Invalid 
110 Transition] 2: Clean = Clean/Shared 


Dirty => Dirty/Shared 
111 Reserved 





Transition3 is useful in nonduplicate tag systems that want to give writable status to the reader and do 
not know if the block is clean or dirty. 


“ Transition! is useful in nonduplicate tag systems that do not update memory on ReadBIk hits to a 
dirty block in another processor. 
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The 21264A holds pending probe commands in a 8-entry deep probe queue. The system 
must count the number of probes that have been sent and ensure that the probes do not 
overrun the 21264A queue. The 21264A removes probes from the internal probe queue 
when the probe response is sent. 


The 21264A expects to hit in cache on a probe response, so it always fetches a cache 
block from the Bcache on system probes. This can become a performance problem for 
systems that do not monitor the Bcache tags, so the 21264A provides Cbox CSR 
PRB_TAG_ONLY[0], which only accesses Bcache tags for system probes. For a 
Bcache hit, the 21264A retries the probe reference to get the associated data. In this 
mode, the 21264A has a cache-hit counter that maintains some history of past cache hits 
in order to fetch the data with the tag in the cases where streamed transactions are being 
performed to the host processor. 


4.7.7.2 Data Transfer Commands (Two Cycles) 


Data transfer commands use a 2-cycle format on SysAddIn_L[14:0]. The SysDc[4:0} 
field indicates success or failure for ChangeToDirty and MB commands, and error con- 
ditions as shown in Table 4-24. 


The pattern of data is controlled by the SysDataInValid_L and SysDataOutValid_L 
signals. These signals are valid each cycle of data transfer, indicating any gaps in the 
data cycle pattern. The SysDataInValid_L and SysDataOutValid_L signals are 
described in Section 4.7.8.4. Table 4-23 shows the format of the data transfer com- 
mand. 


Table 4-23 Data Transfer Command Format 


SysAddin_L[14:2] SysAddin_L[1] | SysAddIn_L[0} | 
Geer jo [saber fave Jaro [a [0 ‘aan anion 





esse? |e 


Table 4-24 describes the SysDc[4:0] field. 


Table 4-24 SysDc[4:0] Field Description 
SysDC[4:0] Command SysDc[4:0] Description 


NOP 00000 NOP, SysData is ignored by the 21264A. 

ReadDataEnor 00001 Data is returned for read commands. The system drives the SysData 
bus, I/O, or memory NXM. 

ChangeToDirtySuccess 00100 No data. SysData is ignored by the 21264A. This command is also 
used for the InvalToDirty response. 

ChangeToDirtyFail 00101 No data. SysData is ignored by the 21264A. This command is also 
used for the Evict response. 

MBDone 00110 Memory barrier operation completed. 

ReleaseBuffer 00111 Command to alert the 21264A that the RVB, RPB, and ID field are 
valid. 

ReadData 100xx Data returned for read commands. The system drives SysData. The 

(System Wrap) system uses SysDc[1:0] to control the wrap order. See Section 


4.7.8.6 for a description of the data wrapping scheme. 
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Table 4-24 SysDc[4:0] Field Description (Continued) 
SysDC[4:0] Command SysDc[4:0] Description 


ReadDataDirty 101xx Data is returned for Rdx and RdModx commands. The ending tag 
(System Wrap) Status is dirty. The system uses SysDc[1:0] to define the wrap order. 
ReadDataShared 110xx Data is returned for read commands. The system drives the data. The 
(System Wrap) tag is marked shared. The system uses SysDc{1:0] to control the 
wrap order. 

ReadDataShared/Dirty 111xx Data is returned for the RdBIk command. The ending tag status is 
(System Wrap) Shared/Dirty. The system uses SysDc[1:0] to control the wrap order. 
WriteData 010xx Data is sent for 21264A write commands or system probes. The 


21264A drives during the SysData cycles. The lower two bits of the 
command specify the octaword address around which the 21264A 
wraps the data. 


The A bit in the first cycle indicates that the command is acknowledged. When A = 1, the 
21264A decrements its command outstanding counter, but the A bit is not necessarily 
related to the current SysDc command. 


Probe commands can combine a SysDc command along with MBDone. In that event, 
the probe is considered ahead of the SysDc command. If the SysDc command allows 
the 21264A to retire an instruction before an MB, or allows the 21264<A itself to retire 
an MB (SysDc is MBDone), that MB will not complete until the probe is executed. 


The system can select the ending cache status for a cache fill operation by specifying 
the status in one of the following SysDc commands: 


ReadData (Clean) ReadDataShared (Clean/Shared) 
ReadDataDirty (Dirty) | ReadDataShared/Dirty (Shared/Dirty) 


The system returns ReadDataShared or ReadData for ReadBIk commands, and ReadD- 
ataDirty for a ReadMod command. However, other combinations are possible, but 
should be used only after a careful study of the situation. 


The ChangeToDirtySuccess and ChangeToDirtyFail commands cannot be issued in the 
shadow of SysDc cache fill commands (ReadDataError, Read Data, ReadDataDirty, 
ReadDataShared, and ReadDataShared/Dirty). Each cache fill command allocates eight 
cycles on the SysData bus. Systems are required to ensure that any future SysDc com- 
mands do not cause conflicts with those eight SysData bus cycles. In addition, the sys- 
tem must not issue ChangeToDirtySuccess or ChangeToDirtyFail commands in the six 
SysAddrIn cycles after any of the ReadDatax commands because doing so will over- 
load internal MAF resources in the 21264A. 


Because of an internal 21264A constraint, a minimum memory latency of 
4x BCACHE_CLK_PERIOD is imposed. This latency is measured from A3 of the out- 
going command (the last cycle) to the delivery of the SysDc command to the processor. 


4.7.8 Data Movement In and Out of the 21264A 


There are two modes of operation for data movement in and out of the 21264A: fast 
mode and fast mode disable. The data movement mode is selected using Cbox CSR 
FAST_MODE_DISABLE[O]. Fast data mode allows movement of data from the 

21264A to bypass protocol and achieve the lowest possible latency for probe’s data, 
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write victim data, and I/O write data. Rules and conditions for the two modes are listed 
and described in Sections 4.7.8.2 and 4.7.8.3. Before discussing data movement opera- 
tion, 21264A clock basics are described in Section 4.7.8.1. 


4.7.8.1 21264A Clock Basics 


The 21264A uses a clock forwarding technique to achieve very high bandwidth on its 
pin interfaces. The clock forwarding technique has three main principles: 


1. Local point-to-point transfers can be made safely, and at very high bandwidth, if the 
sender can provide the receiver with a forward clock (FWD_CLK) to latch the 
transmitted data at the receiver. 


- The SysAddOutClk_L and SysDataOutClk_L[7:0] pins provide the forward- 
ing clocks for transfers out of the 21264A. 


— The SysAddInCik_L and SysDataInClk_H[7:0] pins provide the forwarding 
clocks for transfers into the 21264A. 


2. If only one state element was used to capture the transmitted data, and the skew 
between the two clock systems was greater then the bit-rate of the transfer, the data 
valid time of the transmitted data would not be sufficient to safely transfer the 
latched data into the receivers clock domain. In order to avoid this problem, the 
receiver provides a queue that is manipulated in the transmitter’s time domain. 
Using this queue, the data valid window of the transmitted data is extended (to an 
arbitrary size based on the queue size), and the transfer to the receiver’s clock 
domain can be safely made by delaying the unloading of this queue element beyond 
the skew between the two clock domains. The internal clock that unloads this queue 
is labelled INT_.FWD_CLK. INT_FWD_CLK is timed at both the rising and fall- 
ing edges of the external clock, thus appearing to run at twice the external clock’s 
frequency. 


3. The first two points provide the steady state basis for clock forwarded transfers; 
however, both the sender and receiver must be correctly initialized to enable coher- 
ent and predictable transfers. This clock initialization 1s performed during system 
initialization using the ClkFwdRst_H and FrameClk_H signals. 


If both the sender and the receiver are sampling at the same rate, these three principles 
are sufficient to safely make point-to-point transfers using clock forwarding. However, 
it is often desirable for systems to align clock-forwarded transactions on a slower 
SYSCLK that is the basis of all non-processor system transactions. 


The 21264A supports three ratios for SYSCLK to INT_.FWD_CLK: 

one-to-one (1-1), two-to-one (2-1), and four-to-one (4-1). Using one of these ratios, the 
21264A starts transactions on SYSCLK boundaries. This ratio is programmed into the 
21264A using the Cbox CSR SYS_FRAME_LD_VECTOR[4:0]. This ratio is indepen- 
dent of the frequency of FrameClk_H. 


For data movement, the 21264A reacts to SysDc commands when they are resolved 
into the 21264A’s clock domain. This occurs when the 21264A’s INT_.FWD_CLK 
unloads the SysDc command from the clock forwarding queue. This moment is deter- 
mined by the amount of delay programmed into the clock forwarding silo (by way of 
Cbox CSR SYS_RCV_MUX_CNT_PRESET[1:0]). Thus, all the timing relationships 
are relative to this unload point in time, which will be referred to as the point the com- 
mand is perceived by 21264A. 
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4.7.8.2 Fast Data Mode 


The 21264A is the default driver of the bidirectional SysData bus!. As the 21264A is 
processing WrVictim, ProbeResponse (only the hit case), and IOWB commands to the 
system, accompanying data is made available at the clock-forwarded bus. 


Because there is a bandwidth difference between address (4 cycles) and data (8 cycles) 
transfers, the 21264A tries to fully use fast data mode by delaying the next SysAddOut 
write command until a fast data mode slot is available on the SysDataOut bus. 


SysDc commands (cache fill or explicit write commands) that collide with the fast data 
on the SysData bus have higher priority, and so may interrupt the successful completion 
of the fast transfer. Systems are responsible for detecting and replaying all interrupted 
fast transfers. There are no gaps in a fast transfer and no data wrapping (the first eyele 
contains QWO, addressed by PA[5:3] = 000). 


The system must release victim buffers, and probe buffers and IOWB entries by send- 
ing a SysDc command with the appropriate RVB/RPB bit for both successful fast data 
transfers and for transfers that have been replayed. Fast data transfers have two parts: 


1. SysAddOut command with the probe response, WrVictim, or Wr(I/O) 


2. Data 


The command precedes data by at least one SYSCLK period. Table 4-25 shows the 
number of SYSCLK cycles between SysAddOut and SysData for all system clock 
ratios (clock forwarded bit times) and system framing clock multiples. 


Table 4-25 SYSCLK Cycles Between SysAddOut and SysData 


GCLK/INT_FWD_CLK (Data Rate Ratio) 


System framing clock ratio |1.5X 2.0% 25X 3.0X 35X 40X 50X 60X .7.0X 8.0X 


] 
2 
4 


4 3 2 2 Z Z 1 1 Lo 1 
2 2 ] | ] 1 ] ] 1 ] 
1 ] 1 ] ] ] ] ] ] 1 


Figure 4-5 show a simple example of a fast transfer. The data rate ratio is 1.5X with a 
4:1 SYSCLK to INT_FWD_CLK ratio. 


1 The SysData bus contains SysData_L[63:0] and SysCheck_L[7:0]. 
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In fast data mode, movement of data into the 21264A requires turning around the Sys- 
Data bus that is being actively driven by the 21264A. Given a SysDc fill command 

(ReadDataError, ReadData, ReadDataShared, Read Dates hard Dany, sabe tener ca 
the 21264A responds as follows: 


1. Three GCLK cycles after perceiving the SysDc fill command, the 21264A turns off 
its drivers, interrupting any ongoing fast data write transactions. 


2. The 21264A drivers stay off until the last piece of fill data is received, or a new 
SysDc write command overrides the current SysDc fill command. It is the responsi- 
bility of the external system to schedule SysDc fill or write commands so that there 
is no conflict on the SysData bus. 


3. The 21264A samples fill data in the GCLK clock domain, 10 + SYSDC_DELAY 
GCLK cycles after perceiving the SysDc fill command. The Cbox CSR 
SYSDC_DELAY[4:0] provides GCLK granularity for precisely placing fills into 
the processor pipeline discussed in Section 2.2. 


Table 4—26 shows four example configurations and shows their use of the 


SYSDC_DELAY [4:0]. 


Table 4-26 Chox CSR SYSDC_DELAY[4:0] Examples 
System Framing Clock Ratio' | SYSDC_DELAY 


System Bit Rate 
System ] 1.5X 
System 2 2.0X 
System 3 2.5X 
System 4 4X 


I 
SYSCLK cycles. 


4:1 
2:1 
2:1 
24 


5 (3 SYSCLK cycles) 
2 (3 SYSCLK cycles) 
0 (2 SYSCLK cycles) 
6 (2 SYSCLK cycles) 


The system framing clock ratio is the number of INT _FWD_CLK cycles per 
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System |] has six GCLKs to every SYSCLK and only sends 4-cycle commands to the 
21264A. Thus, a period of three SYSCLKs between the SysDc command and data 
leaves a period of 15 GCLKs between SysDc and data (SysDc is in the middle of the 4- 
cycle command). A SYSDC_DELAY[4:0] of five would align sampling and receipt of 
SysData. 


System 2, has four GCLKs in every SYSCLK, so leading data by three SYSCLK 
cycles, and programming the SYSDC_DELAY[4:0] to two, aligns sampling and receiv- 
ing. 

Timing for systems 3 and 4 is derived in a similar manner. 


Note: The maximum valid value for SYSDC_DELAY must be less than the min- 
imum number of GCLK cycles between two consecutive SYSDC com- 
mands to the 21264A. 


If a fast data transfer is interrupted and fails to complete, the system must use the con- 
ventional protocol to send a SysDc WriteData command to the 21264A, removing the 
desired data buffer. Section 4.7.8.3 describes the timing events for transferring data 
from the 21264A to the system. 


4.7.8.3 Fast Data Disable Mode 


The system controls all data movement to and from the 21264A. Movement of data into 
and out of the 21264A is preceded by a SysDc command. The 21264A drivers are only 
enabled for the duration of an 8-cycle transfer of data from the 21264A to the system. 
Systems must ensure that there is no overlap of enabled drivers and that there is ade- 
quate settle time on the SysData bus. 


Given a SysDc fill command, the 21264A samples data 10 + SYSDC_DELAY GCLK 
cycles after the command is perceived within the 21264A clock domain. Because there 
is no linkage with the output driver, fills into the 21264A are not affected by the 
SYS_RCV_MUX_PRESET[1:0] value. 


In both modes, given a SysDc write command, the 21264A looks for the next SYSCLK 
edge 8.5 cycles after perceiving the SysDc write command in its clock domain. Because 
the SysDc write command must be perceived before its use, SysDc write commands are 
dependent upon the amount of delay introduced by Cbox CSR 
SYS_RCV_MUX_CNT_PRESET{[1:0]. 


Table 4-27 lists information for the four timing examples. In Table 427, note the fol- 
lowing: 
¢ SysDc write commands are not affected by the SYSDC_DELAY parameter. 


¢ TheSYS_RCV_MUX_PRESET adds delay at the rate of one INT_.FWD_CLK at a 
time. For example, adding the delay of one bit time to system 1 adds 1.5 GCLK 
cycles to the delay and drives the SysDc write command-to-data relationship from 
one to two SYSCLKs. 
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¢ For wnite transfers, the 21264A drivers are enabled on the preceding GCLK 
BPHASE, before the start of a write transfer, and disabled on the succeeding GCLK 
BPHASE at the end of the write transfer. The write data is enveloped by the 


21264A drivers to guarantee that every data transfer has the same data valid win- 
dow. 


Table 4-27 Four Timing Examples 


System Bit Rate System Framing Clock Ratio! Write Data 
System 1 1.5X 4:1 2 SYSCLKs 
System 2 2.0X 2:1 3SYSCLKs _ 
System 3 2.5X 2:1 2 SYSCLKs 
System 4 4X 2:1 2 SYSCLKs 


! The system framing clock ratio is the number of INT_.FWD_CLK cycles per 


SYSCLK cycles. 


The four examples described here assume no skew for the 2.0X and 4.0X cases and one 
bit time of skew for the 1.5X and 2.5X cases. 


For system 1, the distance between SysDc and the first SYSCLK is nine GCLK cycles 
but the additional delay of one bit time (1.5 GCLKs) puts the actual delay after perceiv- 
ing the SysDc command to 7.5 GCLKS, which misses the 8.5 cycle constraint. There- 
fore, the 21264A drives data two SYSCLKs after receiving the SysDc write command. 


For system two, the distance between SysDc and the second SYSCLK is eight GCLK 
cycles, which also misses the 8.5 cycle constraint, so the 21264A drives data three 
SYSCLK cycles after receiving the SysDc write command (12 cycles). 


The other two cases are derived in a similar manner. 
4.7.8.4 SysDatalnValid_L and SysDataOutValid_L 


The SysDataValid signals (SysDataIn Valid_L and SysDataOutValid_L) are driven by 
the system and control the rate of data delivery to and from the 21264A. 


SysDatainValid_L 


The SysDataInValid_L signal controls the flow of data into the 21264A, and may be 
used to introduce an arbitrary number of cycles between octaword transfers into the 
21264A. The rules for using SysDataInValid_L follow: 


1. The SysDataInValid_L signal must be asserted for both cycles of a SysDe fill 
command, and two quadwords of data must be delivered to the 21264A in succeed- 
ing bit-clock cycles with the appropriate timing in reference to the SysDc fill com- 
mand (SYSDC_DELAY + 10 CPU cycles). 


2. Any number of bubble cycles can be introduced within the fill by deasserting 
SysDataInValid_L between octaword transfers. 


3. The transfer of fill data can continue by asserting SysDataInValid_L for at least 
two bit-clock cycles, and delivering data SYSDC_DELAY + 10 CPU cycles after 
the assertion of SysDataInValid_L. 
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4. The 21264A must see SysDatalInValid_L asserted for eight data cycles in order to 
complete a fill. When the eighth cycle of an asserted SysDataInValid_L is per- 
ceived by the 21264A, the transfer is complete. ~ 


5. Systems that do not use SysDataInValid_L may tie the pin to the asserted state. 


If SYSDC_DELAY is greater than the bit-time of a transfer, the SysDataInValid_L 
signal must be internally pipelined. To enable the correct sampling of 
SysDataInValid_L, the 21264A provides a delay, with Cbox CSR 
DATA_VALID_DELAY [1:0], that is equal to SYSDC_DELAY[4:0}/bit- time. For 
example, consider system 1 in Table 4-26, which has a SYSDC_DELAY of five 
GCLKs. Running at a bit-time of 1.5X, the DATA_VALID_DELAY[1,0] is pro- 
grammed with a value of three. 


SysDataOutValid_L 


Systems that use a ratio of 1:1 for SYSCLK:INT_FWD_CLK may control the flow of 
data out of the 21264A by using SysDataOutValid_L as follows: 


1. The SysDataOutValid_L pin must be asserted for at least the first cycle of the 
SysDc write command that initiates a write transfer. 


> Any number of bubble cycles may be introduced between a transfers by 
deasserting SysDataOutValid_L. 


3. The 21264A must see the SysDataOutValid_L signal asserted for eight data cycles 
to complete a write transaction, and when the eighth cycle of an asserted 
SysDataOutValid_L is perceived by the 21264A, the transfer is complete. 


4.7.8.5 SysFillValid_L 


The SysFill Valid_L pin, when asserted, validates the current memory and I/O data 
transfer into the 21264A. The system designer may tie this pin to the asserted state (val- 
idating all fills), or use it to enable or cancel fills as they progress. The 21264A samples 
SysFillValid_L at D1 time (when the 21264A samples the second data cycle). 

If SysFill Valid_L is asserted at D1 time, the fill will continue uninterrupted. If it is not 
asserted, the 21264A cancels the fill, but expects all eight QWs of data to arrive at its 
system bus before continuing to the next fill. Also, the 21264A maintains the state of 
the MAF, expecting another valid fill to the same MAF entry. Figure 4—6 illustrates 
SysFillValid_L timing. 


Figure 4-6 SysFillValid_L Timing 


1 i I ! ' | ' 
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eats Transport Delay on Address | | | | | | 
| | | | | | | | | | { | 
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| | | | 
SysFillValid_L | | | | | | 
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4.7.8.6 Data Wrapping 


All data movement between the 21264A and the system is composed of 64 bytes in 
eight cycles on the data bus. All 64 bytes of memory data are valid. This applies to 
memory read transactions, memory write transactions, and system probe read transac- 
tions. The wrap order is interleaved. The internal data bus, which delivers data to the 
functional units and the Dcache, is 16 bytes wide, and so, no transfers happen until tw 
data cycles occur on the interface. 


Table 4—28 lists the rules for data wrapping. I/O read and write addresses on the 
SysAddOut bus point to the desired byte, word, LW, or QW, with a combination of 
SysAddOut_L[5:3] and the mask field [7:0]. 


Table 4-28 Data Wrapping Rules 


Command 


ReadQW and 
WrQW 


ReadLW and 
WrLw 


LDByte/Word 
and 
STByte/Word 


Significant Address Mask 
Bits Type Rules 


SysAddOut_L[5:3] QW SysAddOut_L[5:3] contains the exact PA bits of the first 
LDQ or STQ to the block. The mask bits point to the valid 
QWs merged in ascending order. 


SysAddOut_L[5:3] LW SysAddOut_L{5:3] contain the exact PA bits of the first 
LDL or STL to the block. The mask bits point to the valid 
LWs merged in ascending order within one hexword. 


SysAddOut_L[5:3] Byte SysAddOut_L[5:3] contain the exact QW PA bits of the 
LDByte/Word or STByte/Word instruction. The mask bits 
point to the valid byte in the QW. 


The order in which data is provided to the 21264A (for a memory or I/O fill) or moved 
from the 21264A (wnite victims or probe reads) can be determined by the system. The 
system chooses to reflect back the same low-order address bits and the corresponding 
octaword found in the SysAddOut field or the system chooses any other starting point 
within the block. 


SysDc commands for the ReadData, ReadDataShared, and WriteData groups require 
that systems define the position of the first QW by inserting the appropriate value of 
SysAddOut_L[5:3] into bits [1:0] of the command field. The recommended starting 
point is the QW pointed to by the 21264A; however, some systems may find it more 
beneficial to begin the transfer elsewhere. The system must always indicate the starting 
point to the 21264A. The wrap order for subsequent QWs is interleaved. 


Table 4—29 defines the method for systems to specify wrap and deliver data. 


Table 4-29 System Wrap and Deliver Data 





Source/ 
Destination 


Memory 
Memory 
Memory 
Memory 


Memory 


SysDc[4:2] SysDe[1:0] Size Rules 

100 (ReadData) SysAddOut_L[5:4] | Block (64 Bytes) See Note 1 
101(ReadDataDirty) SysAddOut_L[5:4] —_ Block (64 Bytes) See Note 1 
110 (ReadDataShared) SysAddOut_L[5:4] Block (64 Bytes) See Note 1 
111(Read DataShared/Dirty) SysAddOut_L[5:4] Block (64 Bytes) See Note 1 
010 (WriteData) SysAddOut_L[5:4] — Block (64 Bytes) See Note 1 
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Table 4-29 System Wrap and Deliver Data 





Source/ 
Destination 


YO 
VO 
VO 
VO 
VO 
VO 


SysDc[4:2] SysDc[1:0] Size Rules 

100 (ReadData) SysAddOut_L[5:4] QW (8-64 Bytes) See Note | 
100 (ReadData) SysAddOut_L[4:3] | LW(4-32 Bytes) See Note 2 
100 (ReadData) SysAddOut_L[4:3] | Byte/Word See Note 2 
010 (WriteData) SysAddOut_L[5:4] QW (8-64 Bytes) See Note 1 
010 (WriteData) SysAddOut_L[5:4] | LW(4-32 Bytes) See Note 1 
010 (WriteData) SysAddOut_L[5:4] | Byte/Word See Note | 


Note 1: Transfers to and from the 21264A have eight data cycles for a total of eight 
quadwords. The starting point is defined by the system. The preferred start- 
ing point is the one pointed to by SysAddOut_L[5:4]. Systems can insert 
the SysAddOut_L[5:4] into the SysDc[1:0] field of the command. See 
Table 4—30 for the wrap order. 


Note 2: LW and byte/word read transfers differ from all other transfers. The system 
unloads only four QWs of data into eight data cycles by sending each QW 
twice (referred to as double-pumped data transfer). The first QW returned 
is determined by SysAddOut_L[4:3]. The system again may elect to 
choose its own starting point for the transfer and insert that value into 
SysDc[1:0]. See Table 4-31 for the wrap order. 


Table 4-30 defines the interleaved scheme for the wrap order. 


Table 4-30 Wrap Interleave Order 


PA Bits [5:3] of Transferred QW 





First quadword 000 010 100 110 
Second quadword 001 011 101 11] 
Third quadword 010 000 110 100 
Fourth quadword O11 001 111 101 
Fifth quadword 100 110 000 010 
Sixth quadword 101 111 001 Oll 
Seventh quadword 110 100 010 000 
Eighth quadword 11] 101 O11 001 





Table 4-31 defines the wrap order for double-pumped data transfers. 


Table 4-31 Wrap Order for Double-Pumped Data Transfers (Sheet 1 of 2) 


PA [5:3] of Transferred QW 





First quadword x00 x01 x10 x11 
Second quadword x00 x01 x10 x11 
Third quadword x01 x00 x11 x10 
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Table 4-31 Wrap Order for Double-Pumped Data Transfers (Sheet 2 of 2) 
PA [5:3] of Transferred QW 
Fourth quadword x01 x00 x1] x10 
Fifth quadword x10 x1] x00 x01 
Sixth quadword x10 x1] x00 x01 
Seventh quadword x11 x10 x01 x00 
Eighth quadword x11 x10 x01 x00 


4.7.9 Nonexistent Memory Processing 


Like its predecessors, the 21264A can generate references to nonexistent (NXM) mem- 
ory or I/O space. However, unlike the earlier Alpha microprocessor implementations, 
the 21264A can generate speculative references to memory space. To accommodate the 
speculative nature of the 21264A, the system must not generate or lock error registers 
because of speculative references. The 21264A translates all memory references 
through the translation lookaside buffer (TLB) and, in some cases, the 21264A may 
generate speculative references (instruction execution down mispredicted paths) to 
NXM space. In these cases, the system sends a SysDc ReadDataEnrror and the 21264A 
does the following: 


e ~ Delivers an all-ones pattern to all load instructions to the NXM address. 


e Force-fails all store instructions to the NXM address (much like a STx_C 
failure). 


* Invalidates the cache block at the same index by way of an atomic Evict 
command. 


Table 4-32 shows each 21264A command, with NXM addresses, and the appropriate 
system response. 


Table 4-32 21264A Commands with NXM Addresses and System Response 


21264A Command 
NXM Address System/21264A Response 


ProbeResponse Probe responses for addresses to NXM space are of UNPREDICTABLE status. Although 
the final status of a ReadDataEnrror is Invalid, the 21264A fills the block Valid/Clean and 
uses an atomic Evict command to invalidate the block. Systems that send probes to NXM 
space to the 21264A must disregard the probe result. 


RdBlk Load references to NXM space can be speculative. In this case, systems should respond 
RdBlkSpec with a SysDc ReadDataError fill that the 21264A uses to service the original load/Istream 
RdBlkVic command. If the original load command was speculative, the 21264A will remove the 


load instruction that generated the NXM command, and start processing instructions 
down the correctly predicted path. If the command was not speculative, there must be an 
error in the operating system mapping of a virtual address to an illegal physical address, 
and the 21264A provides an all ones pattern as a signature for this bug. The NXM block 
is not cached in the Dcache or Bcache. 


Compaq Confidential 
21264A Revision 1.1 —- Subject To Change Cache and External Interfaces 4-37 


System Port 


Table 4-32 21264A Commands with NXM Addresses and System Response (Continued) 





21264A Command 


NXM Address 


System/21264A Response 





RdBIkI Istream references to NXM space can be speculative. In this case, systems should respond 

RdBlkSpecl with a SysDc ReadDataError fill, which the 21264A will use to service and execute the 

RdBlkVicl original Istream reference. If the original Istream reference was speculative, the 21264A 
will remove the instructions started after the mispredicted instruction that generated the 
NXM reference, and start instruction processing down the correctly predicted path. If the 
reference was not speculative, there must be an error in the operating system mapping of 
a virtual address to an illegal physical address. and the 21264A provides an all ones pat- 
tern as a signature for this bug. The NXM block is not cached in the Bcache, but can be 
cached in the Icache. 

RdBIkMod Store instructions to NXM space initiate R4BIkMod commands. Again, speculative store 

RdBlkModSpec instructions are removed. Nonspeculative store instructions are forced to fail, much like 

RdBlkModVic STx_C instructions that fail. The NXM block is not cached in the Dcache or Bcache. 


WrVictimBlik 


Dirty Victims to NXM space are illegal. Systems should perform a machine check, with 
the 21264A indicating a severe error. 


CleanVictimBlk The 21264A can generate CleanVictimBlk commands to NXM space if the Cbox CSR 
BC_CLEAN_VICTIM[0] bit is asserted and a SysDc ReadDataError has been generated. 
Systems that use clean victims must faithfully deallocate the CleanVictim VAF-. entry. 

Evict If the Cbox CSR ENABLE_EVICT is asserted, the 21264A will generate Evict com- 
mands to NXM space. Systems may use this command to invalidate their duplicate tags. 
Systems must respond with SysDc ChangeToDirtyFail to retire the NXM MAF entry. 

RdBytes Load instructions to I/O space are not speculative, so an I/O reference to NXM space is 

RdLWs an error. Systems must respond with ReadDataError and should generate a machine 

RdQWs check to indicate an operating system error. 

WrBytes Store instructions to I/O space are not speculative, so an I/O reference to NXM space is 

WrLWs an error. Systems must respond by deallocating the appropriate IOWB entries, and should 

WrQWs generate a machine check to indicate an operating system error. 

FetchBik Loads to noncached memory in NXM space may be speculative. Systems must respond 


FetchBlkSpec 


with a SysDc ReadDataError to retire the MAF entry. 


CleanToDirty ChangeToDirty commands to NXM space are impossible in the 21264A because all 
SharedToDirty NXM references to memory space are atomically filled with an Invalid cache status. 
STCChangeToDirty 

InvalToDirty InvalToDirty commands are not speculative, so InvalToDirty commands to NXM space 
InvalToDirty Vic indicate an operating system error. Systems should respond with a SysDc ReadDataError, 


and should generate a machine check to indicate error. 





4.7.10 Ordering of System Port Transactions 


This section describes ordering of system port transactions. The two classes of transac- 
tions are listed here: 


¢ 21264A commands and system probes 
e System probes and SysDc transfers 
4.7.10.1 21264A Commands and System Probes 


This section describes the interaction of 21264A-generated commands and system-gen- 
erated probes that reference the same cache block. Some definitions are presented here: 
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¢ ProbeResponses generated by the 21264A respond to all system-generated probe 
commands. System-generated data transfer commands respond to all 21264A-gen- 
erated data transfer commands. 


¢ The victim address file (VAF) and victim data buffer (VDB) entries each have inde- 
pendent valid bits for both a victim and a probe. 


¢ Probe results indicate a hit on a VAF/VDB and when a WrVictim command has 
been sent to the system. Systems can decide whether to move the buffer once or 
twice. 


¢ ProbeResponses are issued in the order that the system-generated probes were. 
received; however, there is no requirement for the system to retain order when issu- 
ing release buffer commands. 


¢ Probe processing can stall inside the 21264A when the probe entry index matches 
PA[19:6] of a previous probe entry in the VAF. 


¢ The 21264A reserves one VAF entry for probe processing, so that VAF-full condi- 
tions cannot stall the processing of probes at the head of the queue. 


Table 4—33 lists all interactions between pending internal 21264A commands and the 
Probe[2:0] command field, Next Cache Block State, described in Table 4-22. 


Table 4-33 shows the 21264A response to system probe and in-flight command interac- 
tion. In the table, note the following: 


¢ ReadBlkVic and ReadBIkModVic commands do not appear in Table 4-33. If there 
1S interaction between the probe and the victim, it is the same as a WrVictimBlk 
command. 


¢ Probes that invalidate locked blocks do not generate a ReadBIkMod command. The 
21264A fails the STx_C instruction as defined in the Alpha Architecture Handbook, 
Version 4. 
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All read commands (RdBlk, RdBlkMod, Fetch, InvalToDirty) do not interact 
because the 21264A does not yet own the block. 


Table 4-33 21264A Response to System Probe and In-Flight Command Interaction 


Pending Internal 
21264A Command 21264A Response to System Probe and In-Flight Command Interaction 


ReadBlk 
ReadBIkMod 
FetchBlk 
InvalToDirty 
WrVictimBlk 


CleanToDirty 
SharedToDirty 


This case assumes that a WrVictimBIk command has been sent to the system and another 
agent has performed a load/store instruction to the same address. The 21264A provides 
VAF hit information with the probe response so that the system can manage the race con- 
dition between the WrVictimBlk command from this processor and a possible WrVictim- 
Bik command from the probing processor. This race condition can be managed by either 
forcing the completion of the WrVictimBlk command to memory before allowing the 
progress by the probing processor, or by killing the WrVictimBlk command in this proces- 
sor. 


This case assumes that a SetDirty command has been sent to the system environment 
because of a store instruction that hit in the 21264A caches and that another processor has 
performed a load/store instruction to the same address. The 21264A provides MAF hit 
information so that the system can correctly respond to the Set/Dirty command. If the 
next state of the probe was Invalid (the other processor performed a store instruction), and 
the probe reached the system serialization point before the Set/Dirty command, the system 
must either fail the Set/Dirty command or provide the updated data from the other proces- 
SOr. 


STCChangeToDirty This case is similar to case 2, except that the initiating instruction for the Set/Dirty com- 


mand is a STx_C. An address match with an invalidating probe must fail the Set/Dirty 
command. Delivering the updated data from the other processor is not an option because 
of the requirements of the LDx_L/STx_C instruction pair. 


4.7.10.2 System Probes and SysDc Commands 


Ordering of cache transactions at the system serialization point must be reflected in the 
21264A cache system. Table 4-34 shows the rules that a system must follow to control 
the order of cache status update within the 21264A cache structures (including the 
VAF) at the 21264A pins. 


Table 4-34 Rules for System Control of Cache Status Update Order 





First 


Second Rule 





Probe 


Probe 


Probe 


4—40 Cache and External Interfaces 


Probe 


To control the sequence of cache status updates between probes, systems 
can present the probes in order to the 21264A, and the 21264A will 
updated the appropriate cache state (including the VAF) in order. 


SysDc MAF To ensure that a probe updates the internal cache status before a SysDc 


MAF transaction (including fills and ChangeToDirtySuccess commands), 
systems must wait for the probe response before presenting the SysDc 
MAF command to the 21264A. To ensure that a probe updates a VAF entry 
before a SysDc VAF (release buffer), systems must wait for the probe 
response. 


SysDec VAF Same as Probe/SysDc MAF, above. 
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Table 4-34 Rules for System Control of Cache Status Update Order (Continued) 





First 


Second 


Rule 





SysDc MAF 


SysDc MAF 


SysDc MAF 
SysDc VAF 
SysDc VAF 


SysDc VAF 


Probe 


SysDc MAF 


SysDc VAF 


Probe 


SysDc MAF 


SysDc VAF 


4.8 Bcache Port 


The 21264A supports a second-level cache (Bcache) with 64-byte blocks. The Bcache 
size can be 1MB, 2MB, 4MB, 8MB, or 16MB. The Bcache port has a 144-bit data bus 
that is used for data transfers between the 21264A and the Bcache. All Bcache control 
and address signal lines are clocked synchronously on Bcache clock cycle boundaries. 


To ensure that a SysDc MAF command updates the 21264A cache system 
before a probe to the same address, systems must deliver the D1 (the sec- 
ond QW of data delivered to the 21264A) before or in the same cycle as 
the A3 of the probe (the last cycle of the 4-cycle probe command). This 
rule also applies to ChangeToDirtySuccess commands that have a virtual 
DO and D1 transaction. 


SysDc MAF transactions can be ordered into the 21264A by ordering them 
appropriately at the 21264A interface. 


SysDc MAF transactions and SysDc VAF transactions cannot interact 
within the 21264A because the 21264A does not generate MAF transac- 
tions to the same address as existing VAF transactions. 


To ensure that a SysDc VAF invalidates a VAF entry before a probe to the 
same address, the SysDc VAF command must precede the first cycle of the 
4-cycle probe command. 


SysDc MAF transactions and SysDc VAF transactions cannot interact 
within the 21264A because the 21264A does not generate MAF transac- 
tions to the same address as existing VAF transactions. 


SysDc VAF transactions can be ordered into the 21264A by ordering them 
appropriately at the 21264A interface. 


The Bcache supports the following multiples of the GCLK period: 1.5X (dual-data 
mode only), 2X, 2.5X, 3X, 3.5X, 4X, 5X, 6X, 7X, and 8X. However, the 21264A 
imposes a maximum Bcache clock period based on the SYSCLK ratio. Table 4—35 lists 
the range of maximum Bcache clock periods. Section 4.7.8.2 describes fast mode. 


Table 4-35 Range of Maximum Bcache Clock Ratios 





Bcache Clock Ratio with Fast Mode Bcache Clock Ratio with Fast Mode 
SYSCLK Ratio Enabled Disabled 
1.5X 4.0X 7.0X 
2.0% 4.0X 7.0X 
2.5% 5.0X 8.0X 
3.0X 6.0K 8.0X 
3.5X 7.0X% - 8.0X 
4.0% 7.0X 8.0X 
5.0% 8.0X 8.0X 
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Table 4-35 Range of Maximum Bcache Clock Ratios (Continued) 








Bcache Clock Ratio with Fast Mode Bcache Clock Ratio with Fast Mode 
SYSCLK Ratio Enabled Disabled 
6.0X 8.0X 8.0X 
7.0X 8.0X 8.0X 
8.0X 8.0X 8.0X 


The 21264A provides a range of programmable Cbox CSRs to manipulate the Bcache 
port pins so that a variety of industry-standard SSRAMs can communicate efficiently 
with the 21264A. The following SSRAMs can be used:. 


e Nonburst mode Reg/Reg late-write SSRAMs 


e Burst mode Reg/Reg late-write dual-data SSRAMs 


4.8.1 Bcache Port Pins 


Table 3-1 defines the 21264A signal types referred to in this section. Table 4—36 lists 
the Bcache port pin groups along with their type, number, reference clock, and func- 


tional description. 


Table 4-36 Bcache Port Pins 


Pin Name Type 
BcAdd_H[23:4] O_PP 
BcCheck_H[15:0] B_DA_PP 
BcData_H[127:0] B_DA_PP 


’ BcDataInClk_H[7:0] I_DA 
BcDataOE_L O_PP 


BcDataOutClIk_H[3:0] O_PP 
BcDataOutClik_L[3:0] 


BcDataWr_L O_PP 
BcLoad_L O_PP 
BcTag_H[42:20] B_DA_PP 
BcTagDirty_H B_DA_PP 
BcTagInClk_H I_DA 
BcTagOE_L O_PP 
BcTagOutClk_H O_PP 
BcTagOutClik_L 

BcTagParity_H B_DA_PP 


Count Reference Clock Description 
20 Int_Index_BcClk Bcache index 
16 Int_Data_BcClk = output ECC check bits for BcData 
BcDataInClk_H => input 
128 Int_Data_BcCik = output Bcache data 
BcDataInClk_H => input 
8 NA Bcache data input clocks 
1 Int_Index_BcClk Bceache data output enable/chip 
select 
8 NA Bcache data clocks— high and low 
version 
1 Int_Index_BcClk Bcache data write enable 
1 Int_Index_BcClk Bcache burst enable 
23 Int_Data_BcClk = output Bcache tag data 
BcTaginClk_H => input 
] Int_Data_BcClk = output Becache tag dirty bit 
BcTagInCik_H = input 
] NA Tag input data reference clock 
1 Int_Index_BcCik Bcache tag output enable/chip 
select 
Z NA Bcache tag clock— high and low 
versions 
1 Int_Data_BcClk = output Bcache tag parity bit 
BcTagInClk_H = input 
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Table 4~36 Bcache Port Pins (Continued) 





Pin Name 


BcTagShared_H 


BcTag Valid_H 


BeVref 
BcTagWr_L 


Type Count Reference Clock Description 
B_DA_PP 1 Int_Data_BcClk => output Bcache tag shared bit 
BcTagInClk_H = input 
B_DA_PP 1 Int_Data_BcClk = output Bceache tag valid bit 
BcTagInClk_H = input 
I_DC_REF 1 NA Input reference voltage for tag data 
O_PP ] Int_Index_BcClk Bcache data write enable 


4.8.2 Bcache Clocking 


For clocking, the Bcache port pins can be divided into three groups. 


I. 


The Bcache index pins (address and control) are referenced to Int_Add_BcClk, an 
internal version of the Bcache forwarded clock. The index pins are valid for the 
whole period of the Int_Add_BcClk. The index pins are: 


BcAdd_H[23:4] 
BcDataOE_L 
BcDataWr_L 
BcLoad_L 
BcTagOE_L 
BcTagWr_L 


The data pins, when driven as outputs, are referenced to Int_Data_BcClk, another 
internal version of the Bcache forwarded clock. The data pins, when used as inputs, 
can be referenced to the incoming Bcache clocks, BcDataInClk_H[7:0] and 
BcTagInClk_H. Int_Data_BcClk can be delayed relative to Int_Add_BcClk from 
0 to 3 GCLK cycles by using Cbox CSR BC_CPU_CLK_DELAY[1:0]. The data 
pins are: 


BcCheck_H[15:0] 
BcData_H[127:0] 
BcTag_H[42:20] 
BcTagDirty_H 
BcTagParity_H 
BcTagShared_H 
BcTagValid_H 


The Bcache clock pins (BcDataOutCIk_x{3:0] and BeTagOutCik_.x) clock the 
index and data pins at the SSRAMs. These clocks can be delayed from 
Int_Data_BcClk from 0 to 2 GCLK phases (half cycles) using Cbox CSR 
BC_CPU_CLK_DELAY[1:0]. 
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Table 4—37 provides the BC_CPU_CLK_DELAY[1:0] values, which is the delay 
from BC_ADDRESS to BC_WRITE_DATA (and BC_CLOCK_OUT) in GCLK 


cycles. 


Table 4-37 BC_CPU_CLK_DELAY[1:0] Values 
BC_CPU_CLK_DELAY[1:0] Value GCLK Cycles of Delay 


0 0 
1 | 
2 Z 
3 3 


In the 21264A topology, the index pins are loaded by all the SSRAMs, while the clock 
and data pins see a limit load. This arrangement requires a relatively large amount of 
delay between the index pins and the Bcache clock pins to meet the setup constraints at 
the SSRAMs. The 21264A Cbox CSRs can provide a programmable amount of delay 
between the index and clock pins by using Cbox CSRs BC_CPU_CLK_DELAY[1:0] 
and BC_CLK_DELAY{[1:0]. 


Table 4—38 provides the BC_CLK_DELAY[1:0] values, which is the delay from 
BC_WRITE_DATA to BC_CLOCK_OUT, in GCLK phases. 
Table 4-38 BC_CLK_DELAY[1:0] Values 

BC_CLK_DELAY{[1:0] Value GCLK Phases 


0 Invalid (turns off BC_CLOCK_OUT) 
1 0 
2 | 
3 2 


With BC_CPU_CLK_DELAY[1:0] and BC_CLK_DELAY[1:0], a 500-MHz 21264A 
can provide up to 8 ns (3 x 2 + 2) of delay between the index and the outgoing for- 
warded clocks. The relative loading difference between the data and the clock is mini- 
mal, so Cbox CSR BC_CLK_DELAY{[I1:0] alone is sufficient to provide the delay 
needed for the setup constraint at the Bcache data register. 


4.8.2.1 Setting the Period of the Cache Clock 


The free running Bcache clocks are derived from the 21264A GCLK. The porer of the 
Bcache clocks is programmed using the following three Cbox CSRs: 


1. BC_CLK_LD_VECTOR[15:0] 
2. BC_BPHASE_LD_VECTOR[3:0] 


3. BC_FDBK_EN[7:0] 


To program these three CSRs, the programmer must know the bit-rate of the Bcache 
data, and whether only the rising edge or both edges of the clock are used to latch data. 
For example, a 200-MHz late-write SSRAM has a data period of 5 ns. For a 2-ns 
GCLK, the READCLK_RATIO must be set to 2.5X. This part is called a 2.5X SD (sin- 


gle-data part). 
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Table 4-39 shows how the three CSRs are programmed for single-data devices. 


Table 4-39 Program Values to Set the Cache Clock Period (Single-Data) 


Bcache Transfer 


2.0X-SD 
2.5X-SD 
3.0X-SD 
3.5X-SD 
4.0X-SD 
5.0X-SD 
6.0X-SD 
7.0X-SD 
8.0X-SD 


] 


BC_CLK_LD_ VECTOR’ 


5555 
94A5 
9249 
4C99 
3333 
8C63 
71C7 
C387 
OFOF 


These are hexadecimal values. 


BC_BPHASE_LD_VECTOR' 


0 


SS om So Ce ww 


BC_FDBK_EN' 
01 
02 
02 
04 
01 
02 
10 
04 
01 


With the exception of the 2.5X-SD and 3.5X-SD cases, the clock waveform generated 
by the 21264A for the forwarded clocks has a 50-50 duty cycle. In the 2.5X-SD case, 
the 21264A produces an asymmetric clock that is high for two GCLK phases and low 
for three phases. Likewise, for the 3.5X-SD case, the 21264A produces an asymmetric 
clock that is high for three GCLK phases and low for four GCLK phases. Also, for both 
of these cases, the 21264A will only start transactions on the rising edge of the GCLK 


and the Bcache clock. The 1.5X-SD case is not supported. 


A dual-data rate (DDR) SSRAM’s data rate is derived in a similar manner, except that 
because both edges of the clock are used, the SSRAM clock generated is 2X the period 
of the data. This part is called a 2.5X DDR SSRAM. 


Table 4-40 shows how the three CSRs are programmed for dual-data devices. 


Table 4—40 Program Values to Set the Cache Clock Period (Dual-Data Rate) 


Bcache 
Transfer 


1.5X-DD 
2.0X-DD 
2.5X-DD 
3.0X-DD 
3.5X-DD 
4.0X-DD 


BC_CLK_LD_VECTOR' BC_BPHASE_LD_VECTOR' BC_FDBK_EN' 


9249 
3333 
8C63 
71C7 
C387 
OFOF 


A 


o> OWN Oo 
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Table 4-40 Program Values to Set the Cache Clock Period (Dual-Data Rate) (Continued) 





Bcache 


Transfer BC_CLK_LD_VECTOR' BC_BPHASE_LD_VECTOR' BC_FDBK_EN’ 





5.0X-DD 7TC1F 
6.0X-DD FO3F 
7.0X-DD CO7F 


8.0X-DD OOFF 
i 


These are hexadecimal values. 


0 40 
0 10 
0 04 
0 01 


In addition to programming the clock CSRs, the data-sample/drive Cbox CSRs, at the 
pads, must be set appropriately. Table 4-41 lists these CSRs and provides their pro- 
grammed value. 


Table 4-41 Data-Sample/Drive Cbox CSRs 


CBOX CSR 
BC_DDM_FALL_EN[0] 


BC_TAG_DDM_FALL_EN[0] 
BC_DDM_RISE_EN[0] 
BC_TAG_DDM_RISE_EN[0] 


BC_DDMF_ENABLE/[0] 
. BC_DDMR_ENABLE/[0] 
BC_FRM_CLK[0] 


BC_CLKFWD_ENABLE[0] 


Description 


Enables the update of the 21264A’s Bcache outputs referenced to the falling 
edge of the Bcache forwarded clock. Dual-data RAMs assert this CSR. 


Enables the update of the 21264A’s Bcache tag outputs referenced to the fall- 
ing edge of the Bcache forwarded clock. Alway deasserted. 


Enables the update of the 21264A’s Bcache outputs referenced to the rising 
edge of the Bcache forwarded clock. Always asserted. 


Enables the update of the 21264A’s Bcache tag outputs referenced to the ris- 
ing edge of the Bcache forwarded clock. Always asserted. 


Enables the rising edge of the Bcache forwarded clock. Always asserted. 
Enables the falling edge of the Bcache forwarded clock. Always asserted. 


Forces the 21264A to only start Bcache transactions on the rising edge of 
Bcache clocks that also coincide with the rising edge of GCLK. Must be 
asserted for all dual-data parts and single-data parts at 2.5X and 3.5X. 


Enables clock forward enable. Always asserted. 





4.8.3 Bcache Transactions 


The Cbox uses the programmed clock values to start data read, tag read, data write, and 
tag write transactions on the rising edge of a Bcache clock. The Cbox can also be con- 
figured to introduce a programmable number of bubbles when changing between write 
and read commands. The following three sections describe these Bcache transactions. 


4.8.3.1 Bcache Data Read and Tag Read Transactions 


The 21264A always reads four pieces of data (64 bytes) from the Bcache during a data 
read transaction, and always interrogates the tag array on the first cycle. Once started, 
data read transactions are never cancelled. Assuming that the appropriate values have 
been programmed for the Bcache clock period, and with satisfactory delay parameters 
for the SSRAM setup/hold Bcache address latch requirements, a Bcache read command 
proceeds through the 21264A Cbox as follows: 
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1. When the 21264A clocks out the first address value on the Bcache index pins with 
the appropriate Int_Add_BcClk value, the Cbox loads the values of Cbox CSR 
BC_LAT_DATA_PATTERNJ[31:0] and Cbox CSR 
BC_LAT_TAG_PATTERN[23:0] into two shift registers, which shift during every 
GCLK cycle. 


2. The address and control pins are latched into the SSRAMs. During the next cycle, 
the SSRAMs provide data and tag information to the 21264A. 


3. Using the returning forwarded clocks (BcDataInClk_H[7:0], BcTagInClk_H), the 
data/tag information is loaded into the 21264A clock forwarding queue for the 
Bcache. 


4. Based on the value of BC_RCV_MUX_PRESET_CNTT[I1,0] (the unload pointer), 
the result of a Bcache write command 1s loaded into an 21264A GCLK (BPHASE) 
register. 


5. The Cbox CSR BC_LAT_DATA_PATTERN{[31:0] and 
BC_LAT_TAG_PATTERN{[23:0] contain the GCLK frequency at which the output 
of the clock forward FIFO can be consumed by the processor. This provides GCLK 
granularity for the Bcache interface, so that the 21264A can minimize latency to the 
Bcache. When the values based on these Cbox CSRs are shifted down to the bottom 
of the shift register, the processor samples the Bcache data and delivers it to the 
consumers of load data in the 21264A functional units. 


For example, when a 2.5X-SD SSRAM has a latency of eight GCLK cycles from 
BcAdd_H[23:4] to the output of Bcache FIFO, Cbox CSR 
BC_LAT_DATA_PATTERN{[31:0] is programmed to 948 ,¢ and Cbox CSR 
BC_LAT_TAG_PATTERN[23:0] is programmed to 8)¢. The data pattern contains the 
placement for four pieces of data and the aggregate rate of the data is 2.5X. In addition, 
bit one of the BC_LAT_DATA_PATTERN is placed at a GCLK latency of six GCLK 
cycles, which is the minimum latency supported by the 21264A. The 
BC_LAT_TAG_PATTERN contains the placement of the tag data to the 21264A. 


A shift of one to the left increases the latency of the Bcache transfer to nine GCLK 
cycles, and a shift to the right reduces the latency of the Bcache transfer to seven GCLK 
cycles. 


The Cbox performs isolated tag read transactions in response to system probe com- 
mands. In addition, when using burst-mode SSRAMs, the Cbox can combine a separate 
tag read transaction with the tail end of a data read transaction, thus optimizing Bcache 
bandwidth. A Bcache tag read transaction proceeds exactly like a Bcache data read 
transaction, except that only the BC_LAT_TAG PATTERN is used to update the tag 
shift register. 


4.8.3.2 Bcache Data Write Transactions 


During a data write transaction, the 21264A always writes four pieces of data (64 bytes 
of data and 8 bytes of ECC) to the Bcache, and always writes the tag array during the 
first cycle. Once started, data write operations are never cancelled. Given the appropri- 
ate programming of the Bcache clock period and delay parameters to satisfy SSRAM 
setup/hold requirements of the Bcache address latch, a Bcache write transaction pro- 
ceeds through the Cbox as follows: 


1. The Cbox transmits the index and write control signals during a Int_Adr_BcClk 
edge. . 
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The data is placed on Bcache data, tag, and tag status pins on the appropriate 


Int_Data_BcClk edge from 0 to 7 Bcache bit-times later, based on the Cbox CSR 
BC_LATE_WRITE_NUM[2:0]. The BC_LATE_WRITE_NUM[2:0] supports the 
late-write SSRAM, which optimize Bcache data bus bandwidth by minimizing 
bubbles between read and write transactions. For example, single-data late-write 
SSRAMs would need this CSR programmed to a value of one, and dual-data late- 
write SSRAMs would need this CSR programmed to a value of two. 


3. The difference between the data delivery (Int_Data_BcClk) and forwarded clocks 
out provides the setup for the data at the Bcache data flip-flop. 


4. For Bcache writes, the 21264A drivers are enabled on the GCLK BPHASE 
preceding the start of a write transfer, and disabled on the succeeding GCLK 
BPHASE at the end of a write transfer. Thus, the write data is enveloped by the 
21264A drivers to guarantee that every data transfer has the same data-valid 
window. 


4.8.3.3 Bubbles on the Bcache Data Bus 


When changing between read and write transactions on the bidirectional bus, it is often 
necessary to introduce NOP cycles (bubbles) to allow the bus to settle and to drain the 
Bcache read pipeline. The Cbox provides two CSRS, BC_RD_WR_BUBBLES[S5:0] 
and BC_WR_RD_BUBBLES{[3:0], to help control the bubbles between read and write 
transactions. 


The optimum parameters for these CSRs are determined by formulas that include the 
following terms: 


Term 


befrm 


GCLK 


Ratio 


rd_wr 


wr_rd 


Description 


Beache frame clock. 

¢ In dual-data mode, bcfrm is twice the ratio. 

e In single-data mode, the value for bcfrm is deter- 
mined by whether the ratio is even or odd: — 
— When the ratio is even, bcfrm is equal to the ratio. 
— When the ratio is odd, bcfrm is twice the ratio. 


For example, in single-data mode: 


Ratio Becfrm 
Z 2 
2.5 5 


The processor clock 


The number of GCLK cycles per peak Bcache bandwidth transfer. For example, a 
ratio of 2.5 means the peak Bcache bandwidth is 16 bytes for every 2.5 GCLK 
cycles. 


The minimum spacing required between the read and write indices at the data/tag 
pins, expressed as GCLK cycles. 


The minimum spacing required between the write and read indices at the data/tag 
pins, expressed as GCLK cycles. 
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The Relationship Between Write-to-Read — BC_WR_RD_BUBBLES and wr_rd 


The following formulas calculate the relationship between the Cbox CSR 
BC_WR_RD_BUBBLES and wr_trd: 


wr_rd = (BC_WR_RD_ BUBBLES - 1) * bcfrm 

or 

BC_WR_RD_BUBBLES = ((wr_rd + bcfrm - 1) / bcfrm) + 1 
There is never a need to use a value of 0 or 1 for BC_WR_RD_ BUBBLES. 


Ifwr_rd = 4*ratio, then value 3 would be the minimum 
BC_WR_RD_BUBBLES value when bcfrm = 2*ratio, and value 5 would be the 
minimum BC_WR_RD_BUBBLES value when bcfrm = ratio. 


There is a special case forratio = 2.0 in single-data mode. In this case, the for- 
mula is: 


wr_rd = (BC_WR_RD_BUBBLES - 2) * bcfrm 


The Relationship Between Read-to-Write — BC_RD_WR_BUBBLES and rd_wr 


Use the following formula to calculate the value for the Cbox CSR 
BC_RD_WR_BUBBLES that produces the minimum rd_wr restriction: 


BC_RD_WR_BUBBLES = rd_wr - 6 


Note that a value for BC_RD_WR_BUBBLES of zero really means 64 GCLK cycles. 
In that case, amend the formula. For example, it is impossible to have rd_wr = 6 in 


- the 1.5x dual-data rate mode case. 


4.8.4 Pin Descriptions 


This section describes the characteristics of the Bcache interface pins. 


4.8.4.1 BcAdd_H[23:4] 


The BeAdd_H[23:4] pins are high drive outputs that provides the index for the Bcache. 
The 21264A supports Bcache sizes of 1MB, 2MB, 4MB, 8MB, and 16MB. Table 4-42 
lists the values to be programmed into Cbox CSRs BC_LENABLE[0] and 
BC_SIZE[3:0] to support each size of the Bcache. 


Table 4—42 Programming the Bcache to Support Each Size of the Bcache 


BC_ENABLE[0] 
1 
1 
1 


BC_SIZE[3:0] Bcache Size 
0000 1MB 
0001 2MB 
0011 4MB 
O11] 8MB 
1111 16MB 
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When the Cbox CSR BC_BANK_ENABLE/[0] is not set, the unused BcAdd_H[23:4] 
pins are tied to zero. For example, when configured as a 4MB cache, the 21264A never 
changes BcAdd_H[23:22] from logic zero, and when BC_BANK_ENABLE[0] is 
asserted, the 21264A drives the complement of the MSB index on the next higher 
BcAdd_H pin. 


4.8.4.2 Bcache Control Pins 


The Bcache control pins (BcLoad_L, BcDataWr_L, BcDataOE_L, BcTagWr_L, 
BcTagOE_L) are controlled using Cbox CSRs BC_BURST_MODE_ENABLE[0] and 
BC_PENTIUM_MODE[0]}. 


Table 4-43 shows the four combinations of Bcache control pin behavior obtained using 
the two CSRs. 


Table 4-43 Programming the Bcache Control Pins 
BC_PENTIUM_MODE BC_BURST_MODE_ENABLE RAM_TYPE 








0 0 RAM_TYPE A 
0 1 RAM_TYPE B 
] 0 Unsupported 
] 1 : ~_ Unsupported 


Table 4—44 lists the combination of control pin assertion for RAM_TYPE A. 


Table 4—44 Control Pin Assertion for RAM_TYPE A 
TYPE_A NOP RAO RA1 RA2 RA3 NOP NOP WAO WA1 WA2 WA3 NOP 


BcLoad_L H H H 4H H #H H H HH H 4H 4H 
BcDataOE_L H L LL obeH H L LL seLsLsH 
BcDataWr_L H H H H H H H LLL LH 
BcTagOE_L H L HH H H H #H L HH 4H 4H 4H 
BcTagWr_L H H H H H H #H L HH H HF 


Table 4-45 lists the combination of control pin assertion for RAM_TYPE B. 


Table 4—45 Control Pin Assertion for RAM_TYPE B 
TYPE_B NOP RAO RA1 RA2 RA3 NOP NOP WAO WA1 WA2 WA3 NOP 


BcLoad_L H L H H H H H L H H H H 

BcDataOE_L H L is L L H H L L L L H 

BcDataWr_L L H H H H L L L L L, L L 

BcTagOE_L H L H H H H H L H H H H 

BcTagWr_L H H H H H H H L H H H H 
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Table 4—46 lists the combination of control pin assertion for RAM_TYPE C. 


Table 4-46 Control Pin Assertion for RAM_TYPE C 








TYPE_C NOP RAO RA1 RA2 RA3 NOP NOP WAO WA1 WA2 WA3 NOP 
BcLoad_L H H H H 4H H H H H H 4H 4H 
BcDataOE_L H H LL L E L H H H 4H 
BcDataWr_L H H H 4H 4H H H L L LL L 4H 
BcTagOE_L H L L H 4H H H H H H H 4H 
BeTagWr_L H H H H H H H L H H H 4H 


Table 4—47 lists the combination of control pin assertion for RAM_TYPE D. 


Table 4-47 Control Pin Assertion for RAM_TYPE D 


TYPE_D 
BcLoad_L 
BcDataOE_L 
BcDataWr_L 
BceTagOE_L 


NOP RAO RA1 RA2 RA3 NOP NOP WAO WA1 WA2 WA3 NOP 


H L H H H H H L H H H Ho 
H H L L L L L H H H H H 
H H H H H H H L L L L H 
H H L L H H H H H H H H 
H H H H H H H L H H H H 


BcTagWr_L 


Notes: 


1. The NOP condition forRAM_TYPE B is consistent with bursting nonPentium 
style SSRAMs. 


2. In both RAM_TYPE A and RAM_TYPE B, the pins BcDataOE_L and BcTagOE_L 
function changes from output-enable control to chip-select control. 


3. In both RAM_TYPE C and RAM_TYPE D SSRAMs, the pins BceDataOE_L and 
BcTagOE_L function as an asynchronous output enable that envelopes the Bcache 
read data by providing an extra cycle of output enable. 


Using these Cbox CSRs, late-write nonbursting and dual-data rate SSRAMs can be 
connected to the 21264A as described in Appendix E. 


4.8.4.3 BcDatalnCik_H and BcTagInCik_H 


The BcDataInClk_H[7:0] and BcTagInClk_H pins are used to capture tag data and 
data from the Bcache data and tag RAMs respectively. Dual-data rate SSRAMs provide 
a clock output with the data output pins to minimize skew between the data and clock, 
thus allowing maximum bandwidth. The 21264A internally synchronizes the data to its 
GCLK with clock forward receive circuitry similar to that in the system interface. For 
nonDDR SSRAMs, systems can connect the Bcache data and tag output clock pins to 
the Bcache data and tag input clock pins. 
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4.8.5 Bcache Banking 


Bcache banking is possible by decoding the index MSB (as determined by Cbox CSR 
BC_SIZE[3:0]) and asserting Cbox CSR BC_LBANK_ENABLE/[0]. To facilitate bank- 
ing, the 21264A provides the complement of the MSB bit in the next higher unused 
index bit. For example, when configured as an 8MB cache with banking enabled, the 
21264A drives the inversion of PA[22] on BcAdd_H[23] for use as a chip enable in a 
banked configuration. Because there is no higher index bit available for 16MB caches, 
this scheme only works for cache sizes of 1MB, 2MB, 4MB, and 8MB. 


Setting BC_RD_RD_BUBBLE to 1 introduces one Bcache clock cycle of delay 
between consecutive read transactions, regardless of whether or not they are read trans- 
actions to the same bank. 


Setting BC_WR_WR_BUBBLE to 1 introduces one Beache clock cycle of delay 
between consecutive write transactions, regardless of whether or not they are write 
transactions to the same bank. 


Setting BC_SJ_LBANK_ENABLE to | introduces one Bcache clock cycle of delay 
between consecutive read transactions to a different bank (based on the MSB of the 
index), even if BC_RD_RD_BUBBLE is set to 0. No additional delay is inserted 
between consecutive read transactions to the same bank or between consecutive write 
transactions. , 


4.8.6 Disabling the Bcache for Debugging 


The Bcache is a required component for a 21264A-based system. However, for debug 
purposes, the 21264A can be operated with the Bcache disabled. The Bcache can be 
disabled by clearing all of the BC_LENABLE bits in the Cbox WRITE_MANY CSR. 
When disabling the Bcache, the following additional steps must be taken: 


1. The various Bcache control bits in the Cbox WRITE_ONCE chain must be pro- 
grammed to a valid combination (normally the same settings that would be used if 
the Bcache were enabled). 


2. The Bcache must still be initialized (using BC_INIT mode) during the reset PAL 
flow, after which the Bcache should be left disabled. 


3. Error Detection and Correction should be disabled by clearing DC_DAT_ERR_EN 
(bit 7 of the DC_CTL IPR), or the following bits in the Cbox WRITE_ONCE chain 
must be programmed to the indicated values: 


BC_CLK_DELAY [1:0] = Oxl 
BC_CPU_CLK_DELAY [1:0] = Oxl 
BC_CPU_LATE_WRITE_NUM[1:0] = Oxl 
BC_LATE_WRITE_NUM[2:0] = 0x0 
BC_LATE_WRITE_UPPER = 0 
DUP_TAG_ENABLE = 0 


4.9 Interrupts 


The system may request interrupts by way of the IRQ_H[5:0] pins. These six interrupt 
sources are identical. They may be asynchronous, are level sensitive, and can be indi- 
vidually masked by way of the EIE field of the CM_IER IPR. The system designer 
determines how these signals are used and selects their relative priority. 


2 


Internal Processor Register 





This chapter describes 21264A internal processor registers (IPRs). They are separated 
into the following circuit logic groups: Ebox, Ibox, Mbox, and Cbox. 


The gray areas in register figures indicate reserved fields. Bit ranges that are coupled 
with the field name specify those bits in that named field that are included in the IPR. 
For example, in Figure 5-2, the field named COUNTER[31:4] contains bits 31 through 
4 of the COUNTER field from Section 5.1.1. The bit range of COUNTER[31:4] in the 
IPR is also listed in the column Extent in Table 5—2. In many cases, such as this one, the 
bit ranges correspond. However, the bit range of the named field need not always corre- 
spond to the Extent in the IPR. For example, in Figure 5-14, the field VA[47:13] resides 
in IPR IVA_FORM[37:3] under the stated conditions. 


The register contents after initialization are listed in Section 7.8. 
Table 5-1 lists the 21264A internal processor registers. 


Table 5-1 Internal Processor Registers 
MT/MF Latency 











Score- Issued for 
Index Board from Ebox MFPR 
Register Name Mnemonic (Binary) Bit Access Pipe (Cycles) 
Ebox IPRs 
Cycle counter CC 11000000 5 RW IL 1 
Cycle counter control CC_CTL 11000001 5 WO IL — 
Virtual address ' VA 11000010 4,5,6,7 RO 1L 1 
Virtual address control VA_CTL 11000100 5 WO IL _— 
Virtual address format VA_FORM 11000011 4,5,6,7 RO IL i 
Ibox IPRs 
ITB tag array write ITB_TAG 0000 0000 6 WO OL — 
ITB PTE array write | ITB_PTE 00000001 4,0 WoO OL — 
ITB invalidate all process (ASM=0) ITB_IAP 0000 0010 4 WO OL — 
ITB invalidate all ITB_IA 0000 0011 4 WO OL — 
ITB invalidate single ITB_IS 00000100 4,6 WwO OL —_ 
ProfileMePC PMPC 00000101 — RO _— — 
Exception address EXC_ADDR 00000110 — RO OL 3 
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Table 5-1 Internal Processor Registers (Continued) 














MT/MF Latency 
Score- - Issued for 
index Board from Ebox MFPR 

Register Name Mnemonic (Binary) Bit Access Pipe (Cycles) 
Instruction VA format IVA_FORM oood oll! 5 RO OL 3 
Current mode CM 00001001 4 RW OL 
Interrupt enable IER 00001010 4 RW OL 3 
Interrupt enable and current mode IER_CM 0000 10xx 4 RW OL 3 
Software interrupt request SIRR 00001100 4 RW OL 3 
Interrupt summary ISUM 0000110) — RO — — 
Hardware interrupt clear HW_INT_CLR 00001110 4 WwO OL —_ 
Exception summary EXC_SUM 0000 1111 — RO OL 3 
PAL base address PAL_BASE 0001 0000 4 RW OL 3 
Ibox control CTL 00010001 4 RW OL 3 
Ibox status I_STAT 00010110 4 RW OL | a 
Icache flush IC_FLUSH 00010011 4 Ww OL ~ 
Icache flush ASM IC_FLUSH_ASM 00010010 4 WoO OL — 
Clear virtual-to-physical map CLR_MAP 00010101 4,5,6,7 WwO OL — 
Sleep mode SLEEP 00010111 4,5,6,7 WwO OL —_ 
Process context register PCTX Olxnnnnn'! 4 WwW OL 3 
Process context register PCTX OIxx xxxx 4 R OL 3 
Performance counter control PCTR_CTL 00010100 4 RW OL ae 
Mbox IPRs. 
DTB tag array write 0 DTB_TAGO 00100000 2,6 WO OL — 
DTB tag array write | DTB_TAGI 10100000 1,5 WO iL — 
DTB PTE array write 0 _ DTB_PTE0 00100001 0,4 WO OL —_ 
DTB PTE array write | DTB_PTE] 10100001 3,7 wo OL — 
DTB alternate processor mode DTB_ALTMODE 00100110 6 WO IL — 
DTB invalidate all process (ASM =0) DTB_IAP 10100010 7 wO IL — 
DTB invalidate all DTB_IA 10100011 7 wo IL — 
DTB invalidate single (array 0) DTB_ISO 00100100 6 WO OL — 
DTB invalidate single (array 1) DTB_IS1 10100100 7 WO IL — 
DTB address space number 0 DTB_ASNO OOI00101 4 WwO OL — 
DTB address space number 1 DTB_ASNI1 10100101 7 WO IL — 
Memory management status MM_STAT 00100111 — RO OL 3 
Mbox control M_CTL 00101000 6 WO OL — 
Dcache control DC_CTL 00101001 6 WO OL — 
Dcache status DC_STAT 00101010 6 RW OL 3 
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Table 5-1 Internal Processor Registers (Continued) 





MT/MF Latency 


Score- Issued for 
Index Board from Ebox MFPR 
Register Name Mnemonic (Binary) Bit Access Pipe (Cycles) 
Cbox IPRs 
Cbox data C_DATA 00101011 6 RW OL 3 
Cbox shift control C_SHFT 00101100 6 wo OL om 


1When n equals 1, that process context field is selected (FPE, PPCE, ASTRR, ASTER, ASN). 


5.1 Ebox IPRs 


This section describes the internal processor registers that control Ebox functions. 


5.1.1 Cycle Counter Register — CC 


The cycle counter register (CC) is a read-write register. The lower half of CC is a 
counter that, when enabled by way of CC_CTL[32], increments once each CPU cycle. 
The upper half of the register is 32 bits of register storage that may be used as a counter 
offset as described in the Alpha Architecture Handbook, Version 4 under Processor Cycle 
Counter (PCC) Register. 


A HW_MTPR instruction to the CC writes the upper half of the register and leaves the 
lower half unchanged. The RPCC instruction returns the full 64-bit value of the register. 
Figure 5—1 shows the cycle counter register. 


Figure 5-1 Cycle Counter Register 


63 3231 0 
OFFSET 
COUNTER . LK99-0008A 


5.1.2 Cycle Counter Control Register —-CC_CTL 


The cycle counter control register (CC_CTL) is a write-only register through which the 
lower half of the CC register may be written and its associated counter enabled and dis- 
abled. Figure 5—2 shows the cycle counter control register. . 


Figure 5-2 Cycle Counter Control Register 
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CC_ENA 


COUNTER|31:4] LK99-00094, 
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Table 5—2 describes the CC_CTL register fields. 


Table 5-2 Cycle Counter Control Register Fields Description 





Name Extent Type Description 
Reserved [63:33] — — 
CC_ENA [32] WO Counter Enable. 
When set, this bit allows the cycle counter to increment. 
COUNTER[31:4] [31:4] WO CC[31:4] may be written by way of this field. Write transactions 


to CC_CTL result in CC[3:0] being cleared. 
Reserved [3:0} — — 


5.1.3 Virtual Address Register — VA 


The virtual address register (VA) is a read-only register. When a DTB miss or fault 
occurs, the associated effective virtual address is written into the VA register. VA is not 
written when a LD_VPTE gets a DTB miss or Dstream fault. Figure 5-3 shows the vir- 
tual address register. 


Figure 5-3 Virtual Address Register 


63 0 


VA[63:0] LK99.0010A 


5.1.4 Virtual Address Control Register - VA_CTL 


The virtual address control register (VA_CTL) is a wnite-only register that controls the 
way in which the faulting virtual address stored in the VA register is formatted when it 
is read by way of the VA_FORM register. It also contains control bits that affect the 
behavior of the memory pipe virtual address sign extension checkers and the behavior 
of the Ebox extract, insert, and mask instructions. Figure 5—4 shows the virtual address 
control register. 


Figure 5-4 Virtual Address Control Register 
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VPTB[63:30] 
VA_FORM_32 
VA_48 


B_ENDIAN 
LK99-0014A 
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Table 5-3 describes the virtual address control register fields. 


Table 5-3 Virtual Address Contro! Register Fields Description 





Name Extent Type Description 

VPTB[63:30] [63:30] WO _ Virtual Page Table Base. 

See the VA_FORM register section for details. 

Reserved [29:3] — —_— 

VA_FORM_32 [2] WO,0 = This bit is used to control address formatting when reading the 
VA_FORM register. See the section on the VA_FORM register for 
details. 

VA_48 [1] WO,0 This bit controls the format applied to effective virtual addresses 


by the VA_FORM register and the memory pipe virtual address 
sign extension checkers. When VA_48 is clear, the 43-bit virtual 
address format is used, and when VA_48 is set, the 48-bit virtual 
address format is used. 

When VA_48 is set, the sign extension checkers generate an 
access control violation (ACV) if VA[63:0] # SEXT (VA[47:0]). 
When VA_48 is clear, the sign extension checkers generate an 
ACV if VA[63:0] # SEXT(VA[42:0)). 


.BLENDIAN [0] WO,0 Big Endian Mode. 


When set, the shift amount (Rbv{2:0]) is inverted for EXTxx, 
INSxx, and MSKxx instructions. The lower bits of the physical 
address for Dstream accesses are inverted based upon the length 
of the reference as follows: 

Byte: ~ Invert bits [2:0] 

Word: Invert bits [2:1] 

Longword: Inverts bit [2] 


5.1.5 Virtual Address Format Register - VA_FORM 


The virtual address format register (VA_FORM) is a read-only register. It contains the 
virtual page table entry address derived from the faulting virtual address stored in the 
VA register. It also contains the virtual page table base and associated control bits stored 
in the VA_CTL register. 


Figure 5-5 shows VA_FORM when VA_CTL(VA_48) equals 0 and 
VA_CTL(VA_FORM_32) equals 0. 


Figure 5-5 Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 0) 






VPTB[63:33] 


VA[42:13] LK99-0011A 


Figure 5-6 shows VA_FORM when VA_CTL(VA_48) equals 1 and 
VA_CTL(VA_FORM_32) equals 0. 
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Figure 5-6 Virtual Address Format Register (VA_48 = 1, VA_FORM_32 = 0) 






VPTB[63:43] 


SEXT(VA[47]) 

VA[47: 13} LK99-0012A 
Figure 5-7 shows VA_FORM when VA_CTL(VA_48) equals 0 and 
VA_CTL(VA_FORM_32) equals 1. 

Figure 5-7 Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 1) 
63 3029 2221 3 2 0 





VPTB[63:30] 
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5.2 Ibox IPRs 


This section describes the internal processor registers that control Ibox functions. 


5.2.1 ITB Tag Array Write Register — ITB_TAG 


The ITB tag array write register (ITB_TAG) is a write-only register. The ITB tag array 
is written by way of this register. A write transaction to ITB_TAG writes a register out- 
side the ITB array. When a write to the ITB_PTE register is retired, the contents of both 
the ITB_TAG and ITB_PTE registers are written into the ITB entry. The specific ITB 
entry that is written is determined by a round-robin algorithm; the algorithm wnites to 
entry number 0 as the first entry after the 21264A is reset. Figure 5—8 shows the ITB tag 
array write register. 


Figure 5-8 ITB Tag Array Write Register 
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VA[47:13] : LK99-0015A 


5.2.2 ITB PTE Array Write Register — ITB_PTE 


The ITB PTE array write register (ITB_PTE) is a write-only register through which the 
ITB PTE array is written. A round-robin allocation algorithm is used. A write to the 
ITB_PTE array, when retired, results in both the ITB_TAG and ITB_PTE arrays being 
written. The specific entry that is written is chosen by the round-robin algorithm 
described above. Figure 5—9 shows the ITB PTE array write register. 
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Figure 5-9 ITB PTE Array Write Register 
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5.2.3 ITB Invalidate All Process (ASM=0) Register — ITB_IAP 


The ITB invalidate all process register (ITB_IAP) is a pseudo register that, when wnit- 
ten to, invalidates all ITB entries whose ASM bit is clear. An explicit write to 
IC_FLUSH_ASM is required to flush the Icache of blocks with ASM equal to zero. 


5.2.4 ITB Invalidate All Register — ITB_IA 


The ITB invalidate all register (ITB_IA) is a pseudo register that, when written to, 
invalidates all ITB entries and resets the allocation pointer to its initial state. An 
explicit write to IC_FLUSH is required to flush the Icache. 


5.2.5 ITB Invalidate Single Register - ITB_IS 


The ITB invalidate single register (ITB_IS) is a write-only register. Writing a virtual 


page number to this register invalidates any ITB entry that meets one of the following 
criteria: 


¢ The ITB entry’s virtual page number matches ITB_IS[47:13] (or fewer bits if gran- 
ularity hint bits are set in the ITB entry) and its ASN field matches the address 
space number supplied in PCTX[46:39]. 


¢ The ITB entry’s virtual page number matches ITB_IS[47:13] and its ASM bit is set. 


Figure 5—10 shows the ITB invalidate single register. 


Figure 5-10 ITB Invalidate Single Register 





INVAL_ITB[47:13} 


Note: Because the Icache is virtually. indexed and tagged, it is normally not nec- 


essary to flush the Icache when paging. Therefore, a write to ITB_IS will 
not flush the Icache. 
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5.2.6 ProfileMe PC Register — PMPC 


The ProfileMe PC register (PMPC) is a read-only register that contains the PC of the 
last profiled instruction. Additional information is available in the I_STAT and 
PCTR_CTL register descriptions. 


Usage of PMPC in performance monitoring is described in Section 6.10. 


Figure 5—11 shows the ProfileMe PC register. 


Figure 5-11 ProfileMe PC Register 





LK99-0018A 


Table 5-4 describes the ProfileMe PC register fields. 


Table 5—4 ProfileMe PC Fields Description 


Name Extent Type Description 

PC{[63:2] [63:2] RO Address of the profiled instruction 

Reserved [1] RO Read as zero 

PAL [0] RO Indicates that the PC field contains a physical-mode PALmode - 
address 


5.2.7 Exception Address Register - EXC_ADDR 


The exception address register (EXC_ADDR) is a read-only register that is updated by 
hardware when it encounters an exception or interrupt. 


EXC_ADDR[0] is set if the associated exception occurred in PALmode. The exception 
actions are listed here: 


¢ If the exception was a fault or a synchronous trap, EXC_ADDR contains the PC of 
the instruction that triggered the fault or trap. 


¢ If the exception was an interrupt, EXC_ADDR contains the PC of the next instruc- 
tion that would have executed if the interrupt had not occurred. 


Figure 5-12 shows the exception address register. 


Figure 5-12 Exception Address Register 
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LK99-0018A 
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5.2.8 Instruction Virtual Address Format Register — IVA_FORM 


The instruction virtual address format register (IVA_FORM) is a read-only register. It 
contains the virtual PTE address derived from the faulting virtual address stored in the 
EXC_ADDR register, and from the virtual page table base, VA_48 and VA_FORM_32 
bits, stored in the I_CTL register. 


Figure 5—13 shows IVA_FORM when I_CTL(VA_48) equals 0 and 
I_CTL(VA_FORM_32) equals 0. 


Figure 5-13 Instruction Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 0) 






VPTB[63:33]) 


VA[42:13] LK99-0019A 


Figure 5-14 shows IVA_FORM when I_CTL(VA_48) equals 1 and 
I_CTL(VA_FORM_32) equals 0. 
Figure 5-14 Instruction Virtual Address Format Register (VA_48 = 1, VA_FORM_32 = 0) © 


4342 3837 






VPTB[63:43] 
SEXT(VA[47}) 
VAI47:13] geen 
Figure 5-15 shows IVA_FORM when I_CTL(VA_48) equals 0 and 
I_CTL(VA_FORM_32) equals 1. 


Figure 5-15 Instruction Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 1) 






VPTB[63:30] 
VA(31:13] 


5.2.9 Interrupt Enable and Current Processor Mode Register — IER_CM 


The interrupt enable and current processor mode register (IER_CM) contains the inter- 
rupt enable and current processor mode bit fields. These bit fields can be written either 
individually or together with a single HW_MTPR instruction. When bits [7:2] of the 
IPR index field of a HW_MTPR instruction contain the value 000010), this register is 
selected. Bits [1:0] of the IPR index indicate which bit fields are to be written: bit[1] 
corresponds to the IER field and bit[0] corresponds to the processor mode field. A 
HW_MFPR instruction to this register returns the values in both fields. Figure 5-16 
shows the interrupt enable and current processor mode register. 
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Figure 5-16 Interrupt Enable and Current Processor Mode Register 
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SIEN[15:1] 

ASTEN 
edie LK99-0022A 


Table 5-5 describes the interrupt enable and current processor mode register fields. 


Table 5-5 IER_CM Register Fields Description 





Name Extent Type Description 

Reserved [63:39] — — 

EJEN[S:0] [38:33] RW External Interrupt Enable 

SLEN [32] RW Serial Line Interrupt Enable 

CREN [31] RW Corrected Read Error Interrupt Enable 
PCEN[1:0] [30:29] RW Performance Counter Interrupt Enables 
STIEN[15:1] [28:14] RW Software Interrupt Enables 

ASTEN [13] RW AST Interrupt Enable 


When set, enables those AST interrupt requests that are also 
enabled by the value in ASTER. 


Reserved [125} — — 
CM[1:0] — [4:3] RW Current Mode 
00 Kernel 
01 Executive 
10 Supervisor 
11 User 
Reserved [2:0] — —_ 


5.2.10 Software Interrupt Request Register — SIRR 


The software interrupt request register (SIRR) is a read-write register containing bits to 
request software interrupts. To generate a particular software interrupt, its correspond- 
ing bits in SIRR and JER[SIER] must both be set. Figure 5—17 shows the software 
interrupt request register. 
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Figure 5-17 Software Interrupt Request Register 
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SIR[15:1] LK99-0023A 
Table 5-6 describes the software interrupt request register fields. 


Table 5-6 Software Interrupt Request Register Fields Description 








Name Extent Type Description 

Reserved [63:29] — — 

SIR(15:1] [28:14] RW Software Interrupt Requests 
Reserved [13:0] — — 


5.2.11 Interrupt Summary Register — ISUM 


The interrupt summary register (ISUM) is a read-only register that records all pending 
hardware, software, and AST interrupt requests that have their corresponding enable bit 
set. 


If a new interrupt (hardware, serial line, crd, or performance counters) occurs simuilta- 
neously with an ISUM read, the ISUM read returns zeros. That condition is normally 
assumed to be a passive release condition. The interrupt is signaled again when the 
PALcode returns to native mode. The effects of this condition can be minimized by 
reading ISUM twice and ORing the results. 


Usage of ISUM in performance monitoring is described in Section 6.10. Figure 5-18 
shows the interrupt summary register. 


Figure 5-18 Interrupt Summary Register 
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Table 5—7 describes the interrupt summary register fields. 


Table 5-7 Interrupt Summary Register Fields Description 








Name Extent Type Description 

Reserved [63:39] —_ —_— 

EI[5:0] [38:33] RO External Interrupts 

SL [32] RO Serial Line Interrupt 

CR {31} RO Corrected Read Error Interrupts 

PC(1i:0} [30:29] RO Performance Counter Interrupts 
PCO when PC[O] is set. 
PC1 when PC[1] is set. 

S][15:1] (28:14] RO Software Interrupts 

Reserved [13:11] — —_— 

ASTU, ASTS {10],[9] RO AST Interrupts 
For each processor mode, the bit is set if an associated AST 
interrupt is pending. This includes the mode’s ASTER and 
ASTRR bits and whether the processor mode value held in the 
IER_CM register is greater than or equal to the value for the 
mode. 

Reserved [8:5] —_— —_ 

ASTE, ASTK [4],[3] RO AST Interrupts 
For each processor mode, the bit is set if an associated AST 
interrupt is pending. This includes the mode’s ASTER and 
ASTRR bits and whether the processor mode value held in the 
TER_CM register is greater than or equal to the value for the 
mode. 

Reserved [2:0] — —_ 


5.2.12 Hardware Interrupt Clear Register - HW_INT_CLR 


The hardware interrupt clear register (HW_INT_CLR) is a write-only register used to 
clear edge-sensitive interrupt requests. See Section D.31 for more information about the 
PALcode restriction concerning this register. Figure 5—19 shows the hardware interrupt 


clear register. 


Figure 5-19 Hardware Interrupt Clear Register 
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Table 5-8 describes the hardware interrupt clear register fields. 


Table 5-8 Hardware Interrupt Clear Register Fields Description 








Name Extent Type Description 

Reserved [63:33] — — 

SL [32] WIC Clears serial line interrupt request 

CR [31] WIC Clears corrected read error interrupt request 

PC[1:0} [30:29] WIC Clears performance counter interrupt requests 

MCHK_D [28] WIC Clears Dstream machine check interrupt request 

Reserved ~ 20) — —_— 

FBTP [26] WIS Forces the next Bcache hit that fills the Icache to generate bad 


Icache fill parity 
Reserved [25:0] — — 


5.2.13 Exception Summary Register - EXC_SUM 


The exception summary register (EXC_SUM) is a read-only register that contains 
information about instructions that have triggered traps. The register is updated at trap 
delivery time. Its contents are valid only if it is read (by way of a HW_MFPR) in the 
first fetch block of the exception handler. There are three types of traps for which this 
register captures related information: 


e Arithmetic traps: The instruction generated an exceptional condition that should be 
reported to the operating system, and/or the FPCR status bit associated with this 
condition is clear and should be set by PALcode. Additionally, the REG field con- 
tains the register number of the destination specifier for the instruction that trig- 
gered the trap. 


¢ Istream ACV: The BAD_IVA bit of this register indicates whether the offending 
Istream virtual address is latched into the EXC_ADDR register or the VA register. 


¢ Dstream exceptions: The REG field contains the register number of either the 
source specifier (for stores) or the destination specifier (for loads) of the instruction 
that triggered the trap. 


Figure 5-20 shows the exception summary register. 
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Figure 5-20 Exception Summary Register 
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SEXT(SET_IOV) 








SET_IOV 
SET_INE 
SET_UNF 
SET_OVF 
SET_DZE | 
SET_INV | 
PC_OVFL 
BAD_IVA | 
REG[4:0] 
INT 
1OV 
INE 
UNF 
FOV 
DZE 
INV 
SWC 
LK99-0026A 
Table 5—9 describes the exception summary register fields. 
Table 5-9 Exception Summary Register Fields Description 
Name Extent Type Description 
.SEXT(SET_IOV) [63:48] RO,O  Sign-extended value of bit47, SET_IOV. 
SET_IOV [47] RO PALcode should set FPCR[IOV]. 
SET_INE [46] RO PALcode should set FPCR[INE]. 
SET_UNF [45] RO PALcode should set FPCR[UNF]. 
SET_OVF [44] RO PALcode should set FPCR[OVF]. 
SET_DZE [43] RO PALcode should set FPCR[DZE]. 
SET_INV [42] RO PALcode should set FPCR[INV]. 
PC_OVFL [41] RO Indicates that EXC_ADDR was improperly sign extended for 48- 
bit mode over/underflow IACV. 
Reserved | [40:14] RO,O Reserved for COMPAQ. 
BAD_IVA [13] RO Bad Istream VA. 


This bit should be used by the IACV PALcode routine to deter- 
mine whether the offending I-stream virtual address is latched in 
the EXC_ADDR register or the VA register. If BAD_IVA is clear, 
EXC_ADDR contains the address; if BAD_IVA is set, VA con- 
tains the address. 
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Table 5-9 Exception Summary Register Fields Description (Continued) 


Name Extent Type Description 





REG{4:0] [12:8] RO Destination register of load or operate instruction that triggered 
the trap OR source register of store that triggered the trap. These 
bits may contain the Rc field of an operate instruction or the Ra 
field of a load or store instruction. The value is UNPREDICTABLE 
if the trap was triggered by an ITB miss, interrupt, OPCDEC, or 
other non load/st/operate. 


INT [7] RO Set to indicate Ebox integer overflow trap, clear to indicate Fbox 
trap condition. 

IOV {6} RO Indicates Fbox convert-to-integer overflow or Ebox integer over- 
flow trap. 

INE [5] RO Indicates floating-point inexact error trap. 

UNF [4] RO Indicates floating-point underflow trap. 

FOV [3] RO Indicates floating-point overflow trap. 

DZE {2] RO Indicates divide by zero trap. 

INV [1] RO Indicates invalid operation trap. . 

SWC [0] RO Indicates software completion possible. This bit is set if the 


instruction that triggered the trap contained the /S modifier. 


5.2.14 PAL Base Register - PAL_BASE 


The PAL base register (PAL_BASE) is a read-write register that contains the base phys- 
ical address for PALcode. Its contents are cleared by chip reset but are not cleared after 
waking up from sleep mode or from fault reset. Figure 5-21 shows the PAL base regis- 
ter. 


Figure 5-21 PAL Base Register 


63 4443 1814 0 





PAL_BASE[43:15] LK99-0027A 
Table 5-10 describes the PAL base register fields. 


Table 5-10 PAL Base Register Fields Description 





Name Extent Type Description 

Reserved [63:44] RO, 0 Reserved for COMPAQ. 
PAL_BASE[43:15] [43:15] RW Base physical address for PALcode. 
Reserved [14:0] RO, 0 Reserved for COMPAQ. 


5.2.15 Ibox Control Register —|_CTL 


The Ibox control register (I_CTL) is a read-wnite register that controls various Ibox 
functions. Its contents are cleared by chip reset. Figure 5-22 shows the Ibox control 
register. 
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Figure 5-22 Ibox Control Register 
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SEXT(VPTBI47]) 
VPTBI47:30] 
CHIP_ID[5:0] 

BIST_FAIL 
TB_MB_EN 
MCHK_EN 
ST_WAIT_64K 
PCT1_EN 
PCTO_EN 
SINGLE_ISSUE_H 
VA_FORM_32 
VA_48 
SL_RCV 
SL_XMIT 
HWE 
-BP_MODE{1:0] 
SBE[1:0] 
SDE[1:0] 
SPE[2-0] 
IC_EN(1:0] 
wee LK99-0029A 


Table S—11 describes the Ibox control register fields. 


Table 5-11 Ibox Control Register Fields Description 








Name Extent Type Description 

SEXT(VPTB[47]) [63:48] RW,0 Sign extended VPTB[47]. 

VPTB[47:30] [47:30] RW,0 Virtual Page Table Base. See Section 5.1.5 for details. 

CHIP_ID[5:0] [29:24] RO This is a read-only field that supplies the revision ID number 
for the 21264A part. 


21264A pass 2.1 ID is 0010005. 
21264A pass 2.1.1 ID is 0010105. 
21264A pass 2.1.2 ID is 0011015. 
21264A pass 2.2 ID is 0010015. 
21264A pass 2.2.1 ID is 0010115. 
21264A pass 2.2.2 ID is 0011105. 
21264A pass 2.3 ID is 0011005. 


BIST_FAIL {23] RO,O Indicates the status of BIST (set = pass, clear = fail), 
described in Section 11.5.1. 
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Table 5-11 !box Control Register Fields Description (Continued) 





Name 


TB_MB_EN 


Extent 
[22] 


Type 
RW,0 


Description 


oe i When set, the hardware ensures that the virtual-mode loads 


MCHK_EN 
ST_WAIT_64K 


PCT1_EN 
PCTO_EN 


SINGLE_ISSUE_H 
VA_FORM_ 32 


VA_48 


SL_RCV 
SL_XMIT 
HWE 


(21) 
[20] 


[19] 


[18] 


[17] 


[16] 


[15] 


[14] 
[13] 
[12] 


RW,0 
RW,0 


RW,0 


RW,0 


RW,0 


RW,0 


RW,0 


RO 
WO 
RW,0 


in DTB and ITB fill flows that access the page table and the 
subsequent virtual mode load or store that is being retried are 
‘ordered’ relative to another processor’s stores. This must be 
set for multiprocessor systems in which no MB instruction is 
present in the TB fill flow, unless there are other mecha- 
nisms present that ensure coherency. 


Machine check enable — set to enable machine checks. 


The stWait table is used to reduce load/store order traps. 
When set, the stWait table is cleared after 64K cycles. When 
clear, the stWait table is cleared after 16K cycles. See Sec- 
uon 2.11. 


Enable performance counter #1. If this bit is one, the perfor- 
mance counter will count if either the system (SPCE) or pro- 
cess (PPCE) performance counter enable is asserted. 


Enable performance counter #0. If this bit is one, the perfor- 
mance counter will count if ETTHER the system (SPCE) or 
process (PPCE) performance counter enable is set. 


When set, this bit forces instructions to issue only from ve 
bottom-most entries of the IQ and FQ. 


This bit controls address formatting on a read of the 
IVA_FORM register. 


This bit controls the format applied to effective virtual 
addresses by the IVA_FORM register and the Ibox virtual 
address sign extension checkers. When VA_48 is clear, 43- 
bit virtual address format is used, and when VA_48 is set, 
48-bit virtual address format is used. The effect of this bit on 
the IVA_FORM register is identical to the effect of 
VA_CTL[VA_48] on the VA_FORM register. See Section 
5.1.5. 

When VA_48 is set, the sign extension checkers generate an 
ACV if va[63:0] # SEXT(va[47:0]). When VA_48 is clear, 
the sign extension checkers generate an ACV if va[63:0] # 
SEXT(va[42:0]). 

This bit also affects DTB_DOUBLE traps. If set, the DTB 
double miss traps vector to the DTB_DOUBLE_4 entry 
point. 

DTB_DOUBLE PALcode flow selection is not affected by 
VA_CTL[VA_48]. 


See Section 11.2. 
When set, drives a value on SromClk_H. See Section 11.2. 


If set, allow PALRES intructions to be executed in kernel 
mode. Note that modification of the ITB while in kernel 
mode/native mode may cause UNPREDICTABLE behavior. 
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Table 5-11 Ibox Control Register Fields Description (Continued) 





Name 


Extent 


Type 


Description 





BP_MODE{1:0] 


SBE[1:0] 


SDE[1:0] 


SPE[2:0] 


IC_EN[1:0] 


-SPCE 


5.2.16 lbox Status Register —!|_STAT 


{11:10] 


[9:8] 


[7:6] 


[5:3] 


[2:1] 


[0] 


RW,0 


RW,0 


RW,0 


RW,0O 


RW,3 


RW,0 


Branch Prediction Mode Selection. 


BP_MODE[]], if set, forces all branches to be predicted to 
fall through. If clear, the dynamic branch predictor is chosen. 
BP_MODE/[0]. If set, the dynamic branch predictor chooses 
local history prediction. If clear, the dynamic branch predic- 
tor chooses local or global prediction based on the state of 
the chooser. 


Stream Buffer Enable. 


The value in this bit field specifies the number of Istream 
buffer prefetches (besides the demand-fill) that are launched 
after an Icache miss. If the value is zero, only demand 
requests are launched. 


PALshadow Register Enable. 
Enables access to the PALshadow registers. If SDE[1] is set, 


R4-R7 and R20-R23 are used as PALshadow registers. 
SDE[0} does not affect 21264A operation. 


Super Page Mode Enable. 


Identical to the SPE bits in the Mbox M_CTL SPE[2:0]. See 
Section 5.3.9. 


Icache Set Enable. 


At least one set must be enabled. The entire cache may be 
enabled by setting both bits. Zero, one, or two Icache sets 

can be enabled. 

This bit does not clear the Icache, but only disables fills to 
the affected set. 


System Performance Counting Enable. 


Enables performance counting for the entire system if indi- 
vidual counters (PCTRO or PCTR1) are enabled by setting 
PCTO_EN or PCT1_EN, respectively. 


Performance counting for individual processes can be 
enabled by setting PCTX[PPCE]. See Section 5.2.21 for 
more information. 


See Section 6.10 for information about performance count- 
ing. 


The Ibox status register (I_LSTAT) is a read/write-1-to-clear register that contains Ibox 
status information. 


Usage of I_STAT in performance monitoring is described in Section 6.10. 


Figure 5-23 shows the Ibox status register. 
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MIS 

TARP 

LSO 

TRAP TYPE[3:0} 
ICM 

OVR{(2:0] 

PAR 


Ibox IPRs 


41 4039 3837 343332 302928 0 





Table 5—12 describes the Ibox status register fields. 


Table 5~12 Ibox Status Register Fields Description 





Name Extent Type 
Reserved [63:41] RO 
MIS [40] RO 
TRP [39] RO 
LSO [38] RO 


Description 
Reserved for COMPAQ. 


ProfileMe Mispredict Trap. 

If the I STAT[TRP] bit is set, this bit indicates that the profiled instruc- 
tion caused a mispredict trap. JSR/IMP/RET/COR or HW_JSR/ 
HW_JMP/HW_RET/HW_COR mispredicts do not set this bit but can be 
recognized by the presence of one of these instructions at the PMPC loca- 
tion with the I_STAT[TRP] bit set. This identification is exact in all cases 
except error condition traps. Hardware corrected Icache parity or Dcache 
ECC errors, and machine check traps can occur on any instruction in the 
pipeline. 


ProfileMe Trap. 

This bit indicates that the profiled instruction caused a trap. The trap type 
field, PMPC register, and instruction at the PMPC location are needed to 
distinguish all trap types. 


ProfileMe Load-Store Order Trap. 

If the profiled instruction caused a replay trap, this bit indicates that the 
precise trap cause was an Mbox load-store order replay trap. 

If clear, this bit indicates that the replay trap was any one of the follow- 
ing: 

Mbox load-load order 

Mbox load queue full 

Mbox store queue full 

Mbox wrong size trap (such as, STL > LDQ) 

Mbox Bcache alias (2 physical addresses map to same Bcache line) 
Mbox Deache alias (2 physical addresses map to same Deache line) 
Icache parity error 

Dcache ECC error 
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Table 5-12 Ibox Status Register Fields Description (Continued) 








Name Extent Type Description 
TRAP [37:34] RO ProfileMe Trap Types. 
TYPE[3:0] If the profiled instruction caused a trap (indicated by I_STAT[TRP]), this 


field indicates the trap type as listed here: 
Value Trap Type 
Replay 
Invalid (unused) 
DTB Double miss (3 level page tables) 
DTB Double miss (4 level page tables) 
Floating point disabled 
Unaligned Load/Store 
DTB Single miss 
Dstream Fault 
OPCDEC 
Invalid (use PMPC, described below) 

10 Machine Check 

1] Invalid (use PMPC, described below) 

12 Arithmetic 

13 Invalid (use PMPC, described below) 

14 MT_FPCR 

15 Reset 
Traps due to ITB miss, Istream access violation, or interrupts are not 
reported in the trap type field because they do not cause pipeline aborts. 
Instead, these traps cause pipeline redirection and can be distinguished by 
examining the PMPC value for the presence of the corresponding PAL- 
code entry offset addresses indicated below. In these cases, the ProfileMe 
interrupt will normally be delivered when exiting the trap PALcode flow 
and the EXC_ADDR register will contain the original PC that encoun- 
tered the redirect trap. . 
PMPC[14:0}] Trap 


OMAYNAMNHRWN— © 


0581 ITB miss 
0481 Istream Access Violation 
0681 Interrupt 
ICM (33] RO ProfileMe Icache Miss. . 


This bit indicates that the profiled instruction was contained in an aligned 
4-instruction Icache fetch block that requested a new Icache fill stream. 


OVR[2:0] [32:30] RO ProfileMe Counter 0 Overcount. 
This bit indicates a value (0-7) that must be subtracted from the counter 0 
result to obtain an accurate count of the number of instructions retired in 
the interval beginning three cycles after the profiled instruction reaches 
pipeline stage 2 and ending four cycles after the profiled instruction is 
retired. 


PAR [29] WiC Icache Parity Error. 
This bit indicates that the Icache encountered a parity error on instruction 
fetch. When a parity error is detected, the Icache is flushed, a replay trap 
back to the address of the error instruction is generated, and a correctable 
read interrupt is requested. 


Reserved [28:0] RO Reserved for COMPAQ. 
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5.2.17 Icache Flush Register — IC_FLUSH 


The Icache flush register (IC_FLUSH) is a pseudo register. Writing to this register 
invalidates all Icache blocks. The cache is flushed when the next HW_RET/STALL 
instruction is retired. See Section D.20 for more information. 


5.2.18 Icache Flush ASM Register — |IC_FLUSH_ASM 


The Icache flush ASM register (IC_FLUSH_ASM) is a pseudo register. Writing to this 
register invalidates all Icache blocks with their ASM bit clear. 


5.2.19 Clear Virtual-to-Physical Map Register - CLR_MAP 


The clear virtual-to-physical map register (CLR_MAP) is a pseudo register that, when 
written, results in the clearing of the current map of virtual to physical registers. This 
register must only be written after there are no register-borne dependencies present and 
there are no unretired instructions. See an example in the PALcode restrictions. 


5.2.20 Sleep Mode Register —- SLEEP 


The sleep mode register (SLEEP) is a pseudo register that, when written, results in the 
PLL speed being reduced and the chip entering a low-power mode. This register must 

only be written after a sequence of code has been run which saves all necessary state to 
DRAM, flushes the caches, and unmasks certain interrupts so the chip can be woken up. 
See Section 7.3 for details. 


5.2.21 Process Context Register — PCTX 


The process context register (PCTX) contains information associated with the context 
of a process. Any combination of the bit fields within this register may be written with 
a single HW_MTPR instruction. When bits [7:6] of the IPR index field of a 
HW_MTPR instruction contain the value 01,, this register is selected. Bits [4:0] of the 
IPR index indicate which bit fields are to be written. Usage of PCTX in performance 
monitoring is described in Section 6.10. 


Table 5-13 lists the correspondence between IPR index bits and register fields. 


Table 5-13 IPR Index Bits and Register Fields 
IPR Index Bit Register Field 


0 ASN 

1 ASTER 
2 ASTRR 
3 PPCE 

4 FPE 


A HW_MFPR from this register returns the values in all of its component bit fields. 


Figure 5-24 shows the process context register. 
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Figure 5-24 Process Context Register 
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ASNI7:0] 
ASTRR{3:0) 
ASTER{3:0] 
FPE 

PPCE 


4746 


39 38 


12 98 5 


13 43210 


LK99-0032A 


Table 5-14 describes the process context register fields. 


Table 5—14 Process Context Register Fields Description 


Name 

Reserved 
ASN[7:0] 
Reserved 


ASTRR{[3:0] 


ASTER[3:0] 


Reserved 


Extent 
[63:47] 
[46:39] 
[38:13] 
[12:9] 


[8:5] 


[4:3] 


Type 


RW 


RW 


RW 


5-22 Internal Processor Registers 


Description 


Address space number. 


AST request register—used to request AST interrupts in 
each of the four processor modes. 

To generate a particular AST interrupt, its corresponding 
bits in ASTRR and ASTER must be set, along with the 
ASTE bit in IER. 

Further, the value of the current mode bits in the PS register 
must be equal to or higher than the value of the mode associ- 
ated with the AST request. 

The bit order with this field is: 


User Mode 12 
Supervisor Mode 11 
Executive Mode 10 
Kernel Mode 9 


AST enable register—used to individually enable each of 

the four AST interrupt requests. 

The bit order with this field is: 
User Mode 
Supervisor Mode 
Executive Mode 
Kernel Mode 


WM OW ~) 00 
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Table 5-14 Process Context Register Fields Description (Continued) 








Name Extent Type Description 

FPE [2] RW, |! Floating-point enable—if clear, floating-point instructions 
generate FEN exceptions. This bit is set by hardware on 
reset. 

PPCE [1] RW Process performance counting enable. 


Enables performance counting for an individual process 
with counters PCTRO or PCTR1, which are enabled by set- 
ting PCTO_EN or PCT 1_EN, respectively. 


Performance counting for the entire system can be enabled 
by setting I_CTL{SPCE]. See Section 5.2.15 for more infor- 
mation. 


See Section 6.10 for information about performance count- 
ing. 
Reserved . [0] — — 


5.2.22 Performance Counter Control Register -PCTR_CTL 


The performance counter control register (PCTR_CTL) is a read-wnite register that 
controls the function of the performance counters for either aggregate counting or Pro- 
fileMe sampling counting. 


Usage of PCTR_CTL in performance monitoring is described in Section 6.10. 


Figure 5-25 shows the performance counter control register. 


Figure 5-25 Performance Counter Control Register 


4847 28 27 2625 6543210 


SEXT(PCTRO_CTL[47]) 
PCTRO[19:0} 
PM_STALLED 
PN_KILLED_BM 
PCTR1[19:0] 

SLO 

SL1[1:0] 

VAL 


TAK 
LK99-00348 


Table 5—15 describes the performance counter control register fields. 


Table 5-15 Performance Counter Control Register Fields Description 
Name Extent Type Description 


SEXT(PCTRO_CTL[47]) [63:48] RO When read, this field is sign extended from PCTR_CTL[47]. Writes 
to this field are ignored. 
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Table 5-15 Performance Counter Control Register Fields Description (Continued) 





Extent Type Description 


Name 

PCTRO[19:0] [47:28] RW 
PM_STALLED [27] RO 
PM_KILLED_BM [26] RO 
PCTR1{[19:0] [25:6] RW 
Reserved [5] . RO 
SLO [4] RW 
SL1[1:0] {3:2} RW 

5-24 Internal Processor Registers 


Performance counter 0. 
PCTRO is enabled by I_ CTL[PCTO_EN] and either I_CTL[SPCE} or 
PCTX[PPCE]}. 


In Aggregate mode: 

When enabled, PCTRO is incremented at each cycle by the selected 
input. (See Section 6.10.2 for more information.) 

On overflow, if enabled by IER_CM[PCENO], 

ISUM[PCO] is set and an interrupt is triggered. 


In ProfileMe mode: 

On overflow, a count window is opened and PCTRO is incremented 
as described in Section 6.10.3. When the count window overflows, if 
enabled by IER_CM[PCENO], ISUM[PC0] is set and an interrupt is 
triggered. 


See Table 5—16 for counter modes. 


The profiled instruction stalled for at least one cycle between the 
fetch and map stages of the pipeline. 


The profiled instruction was killed during or before the cycle in 
which it was mapped. 


Performance counter 1. 


PCTR1 is enabled by I_CTL[PCT1_EN] and either I CTL[SPCE} or 
PCTX[PPCE]. 


In Aggregate mode: 

When enabled, PCTR] is incremented at each cycle by the selected 
input. (See Section 6.10.2 for more information.) 

On overflow, if enabled by IER_CM[PCEN1], ISUM[PC1] is set and 
an interrupt is triggered. 


In ProfileMe mode, how PCTR]! is incremented is described in Sec- 
tion 6.10.3. 


In either case, PCTR1 is incremented no more than 1 per cycle. 


See Table 5-16 for counter modes. 
Reads to this field return zero. Writes to this field are ignored. 


Selector 0. 

0 = Aggregate counting mode 

1 = ProfileMe mode 

See Table 5—16 for more information. 


Selector 1. 
Selects counter PCTRO and PCTR1 modes. See Table 5-16 for more 
information. 
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Table 5-15 Performance Counter Control Register Fields Description (Continued) 





Extent Type Description 





Name 
VAL [1] 
TAK {0} 


RO Profiled instruction valid. 
When set, indicates a nontrapping profiled instruction retired valid. 
When clear, indicates that a nontrapping profiled instruction was 
killed after the cycle in which it was mapped. Valid retire/abort status 
for a trapping profiled instruction is determined by the trap type (see 
I_STAT[TRAP_TYPE)}). 


RO _ ProfileMe conditional branch taken. 
Indicates program branch direction, if the profiled instruction is a 
conditional branch. 


Table 5-16 Performance Counter Control Register Input Select Fields 
SLO[4) SL1[3:2] Mode 


0 


0 
0 
0 


5 


Figure 5-26 DTB Tag Array Write Registers 0 and 1 


00 Aggregate 
01 Aggregate 
10 Aggregate 
1] Aggregate 
00 ProfileMe 
01 ProfileMe 
10 ProfileMe 
1} ProfileMe 


.3 Mbox IPRs 


PCTRO 

Retired instructions 
Cycle counting 
Retired instructions 
Cycle counting 
Retired instructions 
Cycle counting 
Retired instructions 


Cycle counting 


PCTR1 

Cycle counting 

Not defined 

Bcache miss or long latency probes 
Mbox replay traps 

Cycle counting 

Inum retire delay 

Bcache miss or long latency probes 


Mbox replay traps 


This section describes the internal processor registers that control Mbox functions. 


5.3.1 DTB Tag Array Write Registers 0 and 1 —- DTB_TAGO, DTB_TAG1 


The DTB tag array write registers O and 1 (DTB_TAGO and DTB_TAG}) are write- 
only registers through which the two memory pipe DTB tag arrays are written. Wnite 
transactions to DTB_TAGO and DTB_TAGI write data to registers outside the DTB 
arrays. When wnte trarisactions to the corresponding DTB_PTE registers are retired, 
the contents of both the DTB_TAG and DTB_PTE registers are written into their 
respective DTB arrays, at locations determined by the round-robin allocation algorithm. 
Figure 5-26 shows the DTB tag array write registers 0 and 1. 
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5.3.2 DTB PTE Array Write Registers 0 and 1 - DTB_PTEO, DTB_PTE1 


The DTB PTE array write registers O and 1 (DTB_PTEO and DTB_PTE1}) are registers 
through which the DTB PTE arrays are written. The entries to be written are chosen by 
a round-robin allocation scheme. Write transactions to the DTB_PTE registers, when 
retired, result in both the DTIB_TAG and DTB_PTE arrays being written. Figure 5-27 
shows the DTB PTE array write registers 0 and 1. 


Figure 5-27 DTB PTE Array Write Registers 0 and 1 
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PA[43:13] 
UWE 
SWE 
EWE 

_ KWE 
URE 
SRE 
ERE 
KRE 

GH[1:0] 
ASM 
FOW 
FOR — 





LK99-0036A 


5.3.3 DTB Alternate Processor Mode Register - DTB_ALTMODE 


The DTB alternate processor mode register (DTB_ALTMODE) is a wnite-only register 
whose contents specify the alternate processor mode used by some HW_LD and 
HW_ST instructions. Figure 5—28 shows the DTB alternate processor mode register. 


Figure 5-28 DTB Alternate Processor Mode Register 
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ALT_MODE[1:0] 


LK99-0037A 


Table 5-17 describes the DT[B_ALTMODE register fields. 


Table 5-17 DTB Alternate Processor Mode Register Fields Description 





Name Extent Type Description 





Reserved [63:2] — — 
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Table 5-17 DTB Alternate Processor Mode Register Fields Description (Continued) 








Name Extent Type Description 
ALT_MODE{[1:0] [1:0] RW Alt_Mode: 
ALT_MODE[1:0] Mode 
00 Kernel 
01 Executive 
10 Supervisor 
1] User 


5.3.4 Dstream TB Invalidate All Process (ASM=0) Register - DTB_IAP 


The Dstream translation buffer invalidate all process (ASM=0) register (DTB_IAP) is.a 
write-only pseudo register. Write transactions to this register invalidate all DTB entries 
in which the address space match (ASM) bit is clear. 


5.3.5 Dstream TB Invalidate All Register —- DTB_IA 


The Dstream translation buffer invalidate all register (DTB_IA) is a write-only pseudo 
register. Write transactions to this register invalidate ee DTB entries and reset the DTB 
not-last-used pointer to its initial state. 


5.3.6 Dstream TB Invalidate Single Registers 0 and 1 —- DTB_ISO,1 


The Dstream translation buffer invalidate single registers (DTB_ISO and DTB_IS1) are 

write-only pseudo registers through which software may invalidate a single entry in the 

DTB arrays. Wniting a virtual page number to one of these registers invalidates any 

DTB entry in the corresponding memory pipeline which meets one of the following cri- 

teria: 

¢ The DTB entry’s virtual page number matches DTB_IS[47:13] and its ASN field 
matches DTB_ASN[63:56]. 


¢ The DTB entry’s virtual page number matches DTB_IS[47:13] and its ASM bit is 
set. 


Figure 5-29 shows the Dstream translation buffer invalidate single registers. 


Figure 5-29 Dstream Translation Buffer Invalidate Single Registers 
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5.3.7 Dstream TB Address Space Number Registers 0 and 1 - DTB_ASNO,1 


The Dstream translation buffer address space number registers (DTB_ASNO and 
DTB_ASN1) are write-only registers that should be written with the address space 
number (ASN) of the current process. Figure 5-30 shows the Dstream translation buffer 
address space number registers O and 1. 


Figure 5-30 Dstream Translation Buffer Address Space Number Registers 0 and 1 
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5.3.8 Memory Management Status Register - MM_STAT 


The memory management status register (MM_STAT) is a read-only register. 

When a Dstream TB miss or fault occurs, information about the error is latched in 
MM_STAT. MM_STAT is not updated when a LD_VPTE gets a DTB miss instruction. 
Figure 5-31 shows the memory management Status register. 


Figure 5-31 Memory Management Status Register | 
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Table 5—18 describes the memory management status register fields. 


Table 5-18 Memory Management Status Register Fields Description 


Name Extent Type Description 
Reserved (63:11] — — 
DC_TAG_PERR [10] RO This bit is set when a Dcache tag parity error occurred during the 


initial tag probe of a load or store instruction. The error created a 
synchronous fault to the DLFAULT PALcode entry point and is 
correctable. The virtual address associated with the error is avail- 
able in the VA register. 


OPCODE[S:0] [9:4] RO Opcode of the instruction that caused the error. 
HW_LD is displayed as 3 and HW_ST is displayed as 7. 


FOW [3] RO This bit is set when a fault-on-write error occurs during a write 
transaction and PIE[FOW] was set. 
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Table 5-18 Memory Management Status Register Fields Description (Continued) 


Name Extent Type Description 
FOR [2] RO This bit is set when a fault-on-read error occurs during a read 
transaction and PTE[FOR] was set. 
ACV [1] RO This bit is set when an access violation occurs during a transac- 
tion. Access violations include a bad virtual address. 
WR [0] RO This bit is set when an error occurs during a write transaction. 
Note: The Ra field of the instruction that triggered the error can be obtained from 


the Ibox EXC_SUM register. 


5.3.9 Mbox Control Register —- M_CTL 


The Mbox control register (M_CTL) is a write-only register. Its contents are cleared by 
chip reset. Figure 5—32 shows the Mbox control register. 


Figure 5-32 Mbox Control Register 
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Table 5-19 describes the Mbox control register fields. 


Table 5-19 Mbox Control Register Fields Description 





Name Extent Type Description 


Reserved [63:6] — — 


SMC[1:0] [5:4] WO,0 Speculative miss control (see Section 4.6.4). 
Bits Meaning When Set 
00 Allow full-time speculation. 
01 Force full-time conservative mode. Make retries wait until retire, 


force all new stores that do not hit dirty to retry, and cause prefetches 
with modify intent (see Section 2.6.2) to behave like normal 
prefetches. 


10 Place 21264A in periodic conservative mode by using an 8-bit 
counter to add by 4 each time a branch mispredict happens and sub- 
tract by one each time a conditional branch retires. Enter conserva- 
tive mode if the MSB of the counter is set. 


11 Place 21264A in periodic conservative mode by using an 8-bit count- 
ner to add by 8 each time a branch mispredict happens and subtract 
by one each time a conditional branch retires. Enter conservative 
mode if the MSB of the counter is set. 


SPE[2:0] [3:1] WO,0 Superpage mode enables. 

SPE[2], when set, enables superpage mapping when VA[47:46] = 2. In this 
mode, VA[43:13] are mapped directly to PA[43:13] and VA[45:44] are 
ignored. 


SPE[1], when set, enables superpage mapping when VA[47:41] = 7Ej¢. In 
this mode, VA[40:13] are mapped directly to PA[40:13] and PA[43:41] are 
copies of PA[40] (sign extension). 


SPE[O], when set, enables superpage mapping when VA[47:30] = 3FFFEj¢. 
In this mode, VA[29:13] are mapped directly to PA{29:13] and PA[43:30] are 
cleared. 


Reserved [0] a pa 


Note: Superpage accesses are only allowed in kerne] mode. Non-kernel mode ref- 
erences to superpages result in access violations. 


5.3.10 Dcache Control Register — DC_CTL 


The Deache control register (DC_CTL) is a write-only register that controls Dcache 
activity. The contents of D©C_CTL are initialized by chip reset as indicated. Figure 5-33 
shows the Dcache control register. 
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Figure 5-33 Dcache Control Register 
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Table 5-20 describes the Dcache control register fields. 
Table 5-20 Dcache Control Register Fields Description 
Name Extent Type Description 
Reserved [63:8] — —_ 
DCDAT_ERR_EN [7] WO,0 _‘~Deache data ECC and parity error enable. 
DCTAG_PAR_EN {6} WO,0 __—Deache tag parity enable. 
F_BAD_DECC [5] WO,0 Force Bad Data ECC. When set, ECC data is not written into 


the cache along with the block that is loaded by a fill or store. 
Writing data that is different from that already in the block will 
cause bad ECC to be present. Since the old ECC value will 
remain, the ECC will be bad. 


F_BAD_TPAR [4] WO,0 _ Force Bad Tag Parity. When set, this bit causes bad tag parity to 
be put into the Dcache tag array during Dcache fill operations. 


Reserved [3] = — 


F_HIT [2] WwoO,0 Force Hit. When set, this bit causes all memory space Joad and 
store instructions to hit in the Dcache, independent of the 
Dcache tag address compare. F_HIT does not force the status of 
the block to register as DIRTY (the tag status bits are still con- 
sulted), so stores may still generate offchip activity. 
In this mode, only one of the two sets may be enabled, and tag 
parity checking must be disabled (set DCTAG_PER_EN to 
zero). 


SET_EN[1:0] [1:0] WwO,3 Dcache Set Enable. At least one set must be enabled. 


5.3.11 Dceache Status Register - DC_STAT 


The Dcache status register (DC_STAT) is a read-write register. If a Dcache tag parity 
error or data ECC error occurs, information about the error is latched in this register. 
Figure 5-34 shows the Dcache status register. 
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Figure 5-34 Dcache Status Register 
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Table 5-21 describes the Dcache status register fields. 
Table 5-21 Dcache Status Register Fields Description 
Name Extent Type Description 
Reserved [63:5] — — 
SEO [4] WIC Second error occurred. When set, this bit indicates that a second 


Decache store ECC error occurred within 6 cycles of the previous 
Dcache store ECC error. 


ECC_ERR_LD [3] WIC ECC error on load. When set, this bit indicates that a single-bit ECC 


error occurred while processing a load from the Dcache or any fill. 
ECC_ERR_ST [2] WIC ECC error on store. When set, this bit indicates that an ECC error 


occurred while processing a store. 


TPERR_P1 [1] WIC Tag parity error — pipe 1. When set, this bit indicates that a Dcache 
tag probe from pipe | resulted in a tag parity error. The error is uncor- 
rectable and results in a machine check. 


TPERR_PO [0] WIC Tag parity error — pipe 0. When set, this bit indicates that a Dcache 
, tag probe from pipe 0 resulted in a tag parity error. The error is uncor- 
rectable and results in a machine check. 


5.4 Cbox CSRs and IPRs 
This section describes the Cbox CSRs and IPRs. 


The Cbox configuration registers are split into three shift register chains: 


e = The hardware allocates 367 bits for the WRITE_ONCE chain, of which the 
21264A uses 304 bits. During hardware reset (after BiST), 367 bits are always 
shifted into the WRITE_ONCE chain from the SROM, MSB first, so that the 
unused bits are shifted out the end of the WRITE_ONCE chain. 


¢ A 36-bit WRITE_MANY chain that is loaded using MTPR instructions to the Cbox 
data register. Six bits of information are shifted into the WRITE_MANY chain dur- 
ing each write transaction to the Cbox data register. 


e A 60-bit Cbox ERROR_REG chain that is read by using MFFR instructions from 
the Cbox data register in combination with MTPR instructions to the Cbox shift 
register. Each write transaction to the Cbox shift register destructively shifts six bits 
of information out of the Cbox error register. 
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5.4.1 Cbox Data Register -C_DATA 


Figure 5-35 shows the Cbox data register. 


Figure 5-35 Cbox Data Register 
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Table 5-22 describes the Cbox data register fields. 


Table 5-22 Cbox Data Register Fields Description 
Name Extent Type Description 
Reserved [63:6] — — 


C_DATA[5:0] [5:0] RW Cbox data register. A HW_MTPR instruction to this register causes six 
bits of data to be placed into a serial shift register. When the 
HW_MTPR instruction is retired, the data is shifted into the Cbox. After 
the Cbox shift register has been accessed, performing a HW_MFPR 
instruction to this register will return six bits of data. 


5.4.2 Cbox Shift Register —- C_SHFT 
Figure 5-36 shows the Cbox shift register. 


Figure 5-36 Cbox Shift Register 
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C_SHIFT —— 
Table 5-23 describes the Cbox shift register fields. 
Table 5-23 Cbox Shift Register Fields Description 
Name Extent Type Description 
Reserved {63:1} — — 
C_SHIFT [0] Wi Writing a 1 to this register bit causes six bits of Cbox IPR data to shift into 


the Cbox data register. Software can then use a HW_MFPR read operation 
to the Cbox data register to read the six bits of data. 





5.4.3 Cbox WRITE_ONCE Chain Description 
The WRITE_ONCE chain order is contained in Table 5-24. In the table: 


¢ Many CSRs are duplicated for ease of hardware implementation. These CSRs‘ are 
indicated in italics. They must be written with values that are identical to the values 
written to the original CSRs. 
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¢ Only a brief description of each CSR is given. The functional description of these 
CSRs is contained in Chapter 4. 


© The order of multibit vectors is [MSB:LSB], so the LSB is first bit in the Cbox 


chain. 


Table 5—24 describes the Cbox WRITE_ONCE chain order from LSB to MSB. 


Table 5-24 Cbox WRITE_ONCE Chain Order 


Cbox WRITE_ONCE Chain 


Description 





32_BYTE_IO[0] 
SKEWED_FILL_MODE[0] 
SKEWED_FILL_MODE[0] 
DCVIC_THRESHOLD][7:0] 


BC_CLEAN_VICTIM[0] 
SYS_BUS_SIZE[1:0] 
SYS_BUS_FORMAT[O] 
SYS_CLK_RATIO[4:1] 


DUP_TAG_ENABLE/[0} 
PRB_TAG_ONLY[0] 
FAST_MODE_DISABLE[0] 
BC_RDVICTIM[0] 
BC_CLEAN_VICTIM[0] 
RDVIC_ACK_INHIBIT 


SYSBUS_MB_ENABLE 
SYSBUS_ACK_LIMIT(0:4} 
SYSBUS_VIC_LIMIT[0:2] 
BC_CLEAN_VICTIM[0] 
BC_WR_WR_BUBBLE/[0} 
BC_RD_WR_BUBBLES[(0:5] 
BC_RD_RD_BUBBLE/(0] 
BC_SJ_BANK_ENABLE 
BC_WR_RD_BUBBLES/(0:3] 
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Enable 32_BYTE I/O mode. 
Asserted when Bcache is at 1.5X ratio. 
Duplicate of prior bit. 


Threshold of the number of Dcache victims that will accumulate 
before streamed write transactions to the Bcache are initiated. The 
Cbox can accumulate up to six victims for streamed Dcache pro- 
cessing. This register is programmed with the decoded value of the 
threshold count. 


Enable clean victims to the system interface. 


Size of SysAddOut and SysAddOut buses. 


Indicates system bus format. 


Speed of system bus. 


Code Multiplier 
0001 1.5X 
0010 2.0X 
0100 2.5X 
1000 3.0X 


Enable duplicate tag mode in the 21264A. 

Enable probe-tag only mode in the 21264A. 

When asserted, disables fast data movement mode. 
Enables RdVictim mode on the pins. 

Duplicate CSR. 


Enable inhibition of incrementing acknowledge counter for RdVic 
commands. 


Enable MB commands offchip. 

Sysbus acknowledge limit CSR. 

Limit for victims. 

Duplicate CSR. 

Write to write GCLK bubble. 

Read to write GCLK bubbles for the Bcache interface. 
Read to read GCLK bubble for banked Beaches. 
Enable bank mode for Bcache. 


Write to read GCLK bubbles. 
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Table 5-24 Cbox WRITE_ONCE Chain Order (Continued) 





Cbox WRITE_ONCE Chain 


Description 





DUP_TAG_ENABLE 
SKEWED_FILL_MODE 
BC_RDVICTIM 
SKEWED_FILL_MODE 
BC_RDVICTIM 
BC_CLEAN_VICTIM 
DUP_TAG_MODE 
SKEWED_FILL_MODE 
ENABLE_PROBE_CHECK 
SPEC_READ_ENABLE(0] 
SKEWED_FILL_MODE 
SKEWED_FILL_MODE 
MBOX_BC_PRB_STALL 


BC_LAT_DATA_PATTERN(0:31] 
BC_LAT_TAG_PATTERN[0:23] 
BC_RDVICTIM 
ENABLE_STC_COMMAND[0] 
BC_LATE_WRITE_NUM{[0:2] 


BC_CPU_LATE_WRITE_NUM{[0:1] 


BC_BURST_MODE_ENABLE/[0} 
BC_PENTIUM_MODE[0] 
SKEWED_FILL_MODE 
BC_FRM_CLK[0] 


BC_CLK_DELAY [0:1] 
BC_DDMR_ENABLE/[0] 


BC_DDMF_ENABLE[0] 


BC_LATE_WRITE_UPPER[0] 


BC_TAG_DDM_FALL_EN[0} 
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Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Enable error checking during probe processing. 


Enable speculative references to the system port. 


| Duplicate CSR. 


Duplicate CSR. 


Must be asserted when BC_RATIO = 4.0X, 5.0X, 6.0X, 7.0X, or 
8.0X. 


Bcache data latency pattern. 

Bcache tag latency pattern. 

Duplicate CSR. 

Enable STx_C instructions to the pins. 


Number of Bcache clocks to delay the data for Bcache write com- 
mands. 


Number of GCLK cycles to delay the Bcache clock/data from 
index. 


Burst mode enable signal. 
Enable Pentium mode RAM behavior. 
Duplicate CSR. 


Force all Bcache transactions to start on rising edges of the A phase 
of a GCLK. 


Delay of Bcache clock for 0,0,1,2 GCLK phases. 


Enables the rising edge of the Bcache forwarded clock (always 
enabled). 


Enable the falling edge of the Bcache forwarded clock. (always - 
enabled). . 


Asserted when (BC_LATE_WRITE_NUM > 3) or 
((BC_LATE_WRITE_NUM = 3) and 
(BC_CPU_LATE_WRITE_NUM > 1)). 


Enables the update of the 21264A Bcache tag outputs based on the 
falling edge of the forwarded clock. 
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Table 5-24 Cbox WRITE_ONCE Chain Order (Continued) 





Cbox WRITE_ONCE Chain 


Description 





BC_TAG_DDM_RISE_EN[0] 


BC_CLKFWD_ENABLE/[0] 
BC_RCV_MUX_CNT_PRESET[0:1] 
BC_LATE_WRITE_UPPER[0] 
SYS_DDM_FALL_EN[0] 


SYS_DDM_RISE_EN[0] 


SYS_CLKFWD_ENABLE[0] 
SYS_RCV_MUX_CNT_PRESET[0:1] 
SYS_CLK_DELAY [0:1] 


SYS_DDMR_ENABLE[0] 
SYS_DDMF_ENABLE/[0] 
BC_DDM_FALL_EN/(0] 
BC_DDM_RISE_EN[0] 


BC_CLKFWD_ENABLE 
‘BC_RCV_MUX_CNT_PRESET(0:1] 
BC_CLK_DELAY{0:1] 
BC_DDMR_ENABLE 
BC_DDMF_ENABLE 
SYS_DDM_FALL_EN 
SYS_DDM_RISE_EN 
SYS_CLKFWD_ENABLE 
SYS_RCV_MUX_CNT_PRESET[0:1] 
SYS_CLK_DELAY[0:1] 
SYS_DDMR_ENABLE 
SYS_DDMF_ENABLE 
BC_DDM_FALL_EN 
BC_DDM_RISE_EN 
BC_CLKFWD_ENABLE 
BC_RCV_MUX_CNT_PRESET[0:1] 
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Enables the update of the 21264A Bcache tag outputs based on the 
rising edge of the forwarded clock. 


Enable clock forwarding on the Bcache interface. 
Initial value for the Bcache clock forwarding unload pointer FIFO. 
Duplicate CSR. 


Enables the update of the 21264A system outputs based on the fall- 
ing edge of the system forwarded clock. 


Enables the update of the 21264A system outputs based on the ris- 
ing edge of the system forwarded clock. 


Enables clock forwarding on the system interface. 
Initial value for the system clock forwarding unload pointer FIFO. 


Delay of 0 to 2 phases between the forwarded clock out and 
address/data. 


Enables the rising edge of the system forwarded clock (always 
enabled). . 


Enables the falling edge of the system forwarded clock (always 
enabled). 


Enables update of data/address on the rising edge of the system for- 
warded clock. 


Enables the update of data/address on the falling edge of the system 
forwarded clock. 


Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
Duplicate CSR. 
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Cbox CSRs and IPRs 








CFR_GCLK_DELAY [0:3] 
CFR_EV6CLK_DELAY [0:2] 


Cbox WRITE_ONCE Chain Description 
SYS_DDM_FALL_EN Duplicate CSR. 
SYS_DDM_RISE_EN Duplicate CSR. 
SYS_CLKFWD_ENABLE Duplicate CSR. 
SYS_RCV_MUX_CNT_PRESET[0:1] Duplicate CSR. 
SYS_CLK_DELAY[0:1] Duplicate CSR. 
SYS_DDMR_ENABLE Duplicate CSR. 
SYS_DDMF_ENABLE Duplicate CSR. 
BC_DDM_FALL_EN Duplicate CSR. 
BC_DDM_RISE_EN Duplicate CSR. 
BC_CLKFWD_ENABLE Duplicate CSR. 
BC_RCV_MUX_CNT_PRESET/[0:1] Duplicate CSR. 
BC_CLK_DELAY[0:1] Duplicate CSR. 
BC_DDMR_ENABLE Duplicate CSR. 
‘BC_DDMF_ENABLE Duplicate CSR. 
SYS_DDM_FALL_EN Duplicate CSR. 
SYS_DDM_RISE_EN Duplicate CSR. 
SYS_CLKFWD_ENABLE Duplicate CSR. 
SYS_RCV_MUX_CNT_PRESET[0:1] Duplicate CSR. 
SYS_CLK_DELAY{ 1:0] Duplicate CSR. 
SYS_DDMR_ENABLE Duplicate CSR. 
SYS_DDMF_ENABLE Duplicate CSR. 
BC_DDM_FALL_EN Duplicate CSR. 
BC_DDM_RISE_EN Duplicate CSR. 
BC_CLKFWD_ENABLE Duplicate CSR. 
BC_RCV_MUX_CNT_PRESET[1:0] Duplicate CSR. 
SYS_CLK_DELAY[0:1] Duplicate CSR. 
SYS_DDMR_ENABLE Duplicate CSR. 
SYS_DDMF_ENABLE Duplicate CSR. 
S YS_DDM_FALL_EN Duplicate CSR. 
SYS_DDM_RISE_EN Duplicate CSR. 
SYS_CLKFWD_ENABLE Duplicate CSR. — 
SYS_RCV_MUX_CNT_PRESET[0: 1] Duplicate CSR. 


Number of GCLK cycles to delay internal ClkFwdRst. 
Number of EV6Clk_x cycles to delay internal ClkFwdRst. 
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Table 5-24 Cbox WRITE_ONCE Chain Order (Continued) 





Cbox WRITE_ONCE Chain 
CFR_FRMCLK_DELAY [0:1] 
BC_LATE_WRITE_NUM[0:2] 


BC_CPU_LATE_WRITE_NUM[1:0] 


JITTER_CMDj[(0} 
FAST_MODE_DISABLE[0] 
SYSDC_DELAY [3:0] 


DATA_VALID_DLY[1:0] 


BC_DDM_FALL_EN 
BC_DDM_RISE_EN 
BC_CPU_CLK_DELAY [0:1] 
BC_FDBK_ENJ[0:7] 


BC_CLK_LD_VECTOR[0:15] 


BC_BPHASE_LD_VECTOR[0:3]} 
SYS_DDM_FALL_EN 
SYS_DDM_RISE_EN 
SYS_CPU_CLK_DELAY[0:1] 


SYS_FDBK_EN[0:7] 
SYS_CLK_LD_VECTOR[0:15] 


SYS_BPHASE_LD_VECTOR[0:3] 
SYS_FRAME_LD_VECTOR[0:4] 


SYSDC_DELAY{[4] 


Description 

Number of FrameClk_x cycles to delay internal ClkFwdRst. 
Duplicate CSR. 

Duplicate CSR. 

Add one GCLK cycle to the SYSDC write path. 

Duplicate CSR. 


Number of GCLK cycles to delay SysDc fill commands before 
action by the Cbox. 


Number of Bcache clock cycles to delay signal SysDatalIn Valid 
before sample by the Cbox. 


Duplicate CSR. 
Duplicate CSR. 
Delay of Bcache clock for 0, 1,2, 3 GCLK cycles. 


CSR to program the Bcache forwarded clock shift register feedback 
points. 


CSR to program the Bcache forwarded clock shift register load val- 
ues. 


CSR to program the Bcache forwarded clock b-phase enables. 
Duplicate CSR. 
Duplicate CSR. 


Delay of 0..3 GCLK cycles between the forwarded clock out and 
address/data. 


CSR to program the system forwarded clock shift register feedback 
points. 


CSR to program the system forwarded clock shift register load val- 
ues. 


CSR to program the system forwarded clock b-phase enables. 


CSR to program the ratio between frame clock and system for- 
warded clock. 


Fifth SYSDC_DELAY bit. 


5.4.4 Cbox WRITE_MANY Chain Description 
The WRITE_MANY chain order is contained in Table 5-25. Note the following: 


e Many CSRs are duplicated for ease of hardware implementation. These CSR names 
are indicated in italics and have two leading asterisks. 


¢ Only a brief description of each CSR is given. The functional description of these 
CSRs is contained in Chapter 3. 


e The order of multibit vectors is [MSB:LSB], so the LSB is first bit in the Cbox | 


chain. 
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Table 5—25 describes the Cbox WRITE_MANY chain order from LSB to MSB. 


Table 5-25 Cbox WRITE_MANY Chain Order 





Cbox WRITE_MANY Chain 
BC_VALID_MODE 
INIT_MODE(0] 

BC_SIZE[3:0] 

BC_ENABLE[0] 

BC_ENABLE 

BC_SIZE[0:3] 
| BC_ENABLE! 

BC_ENABLE! 

BC_ENABLE! 
INVAL_TO_DIRTY_ENABLE{1] 
ENABLE_EVICT 
BC_ENABLE | 
INVAL_TO_DIRTY_ENABLE(0] 
BC_ENABLE 

BC_ENABLE 

BC_ENABLE 
SET_DIRTY_ENABLE[0] 
INVAL_TO_DIRTY_ENABLE[0] 
SET_DIRTY_ENABLE(2:1] 
BC_BANK_ENABLE[0] 
BC_SIZE[0:3] 

INIT_MODE 
BC_WRT_STS[0:3] 





1 


Figure 5-37 shows an example of PALcode used to write to the WRITE_MANY chain. 


Description 


Control Bcache block parity calculation 


Enable initialize mode 
Beache size 

Enable the Bcache 

Duplicate CSR 

Duplicate CSR 

Duplicate CSR 

Duplicate CSR 

Duplicate CSR 

WH64 acknowledges 

Enable issue evict 

Duplicate CSR 

WH64 acknowledges 
Duplicate CSR 

Duplicate CSR 

Duplicate CSR 

SetDirty acknowledge programming 
Duplicate CSR 

SetDirty acknowledge programming 
Enable bank mode for Bcache 
Duplicate CSR 

Duplicate CSR 


Write status for Bcache in initialize-mode 


(Valid, Dirty, Shared, Parity) 


MBZ during initialization mode; see Section 7.6 for information. 


Figure 5-37 WRITE_MANY Chain Write Transaction Example 


; Initialize the Bcache configuration in the Cbox 


: BC_ENABLE 1 
% INIT_MODE 0 
; BC_SIZE = OxF 
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; INVALID_TO_DIRTY_ENABLE = 3 
; ENABLE_EVICT = 1 

; SET_DIRTY_ENABLE = 
; BC_BANK_ENABLE = 1 
; BC_WRT_STS = 0 


6 


; The value for the write_many chain is based on Table 5-25. 

; The value is sampled from MSB, 6 bits at a time, as it is written 
; to EV6__DATA. Therefore, before the value can be shifted in, it 
must be 

; inverted on a by 6 basis. The code then writes out 6 bits at a 
time, 

; shifting right by 6 after each write. 

; So the following transformation is done on the write_many value: 
A (35230) | (29224) [123218] | (17:12) |{11206]] (05:00) => 

; [05:00] | [11:06] ] [17:12] |[23:18] ] [29:24] | [35:30] 

; WRITE_MANY chain = 0x07FBFFFFD 

; value to be shifted in = OxF7FFEFFC1 

; Before the chain can be written, I_CTL[SBE] must be disabled, 

; and the code must be forced into the Icache. 


ALIGN_CACHE_ BLOCK <*x47FF041iF>; align with nops 


mb ; wait for MEM-OP’s to complete 
lda r0, *x0086(r31) > load. TCTs s%« ; 
hw_mtpr r0, EV6__I_CTL eer tghia SDE=2, IC_EN=3, SBE=0 

br rus ; create dest address 

addq rO;, #17, x0 ; finish computing dest address 
hw_mtpr r31, EV6__IC_FLUSH ; flush the Icache 

bne r3i. ; separate retires 

hw_jmp_stall (r0) ; force flush 


ALIGN_CACHE_BLOCK <“*x47FF041F> ; align with nops 


be_config: 


mb ; pull this block in Icache 

lda rl, *xFFC1(r31) ; data[15:00) = OxFFC1 

ldah r0, *x7FFE(r31) ; data[31:16] = Ox7FFE 

zap rly. #°R0C,. £1 ; clear out bits [31:16] 

bis rity ek ; or in bits [31:16] 

addq rod, -#6,- 20 ; shift in 6x 6 bits 
bec_config_shift_in: 

hw_mtprrl, EV6__DATA ; shift in 6 bits 
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subq r0, #1, rO0O ; Gecrement RO 

beq r0, bc_config_done ; Gone if RO is zero 

srl r1, #6, rl ; align next 6 bits 

br r31, be_config_shift_in ; continue shifting 
bce_config_ done: 

hw_mtprr31, <EV6__MM_ STAT ! 64>; wait until last shift 

beq r31, be_config_end ; predicts fall thru 

br r31, .-4 ; predict infinite loop 

bis rol; -roi¢. E32 ; nop 

bis E31, ,-Y31,50-234. ; nop 


bc_config_end: 
5.4.5 Cbox Read Register (IPR) Description 


The Cbox read register is read 6 bits at a time. Table 5~26 shows the ordering from LSB 
to MSB. 


Table 5-26 Cbox Read IPR Fields Description 


Name Description 
C_SYNDROME_1[7:0] Syndrome for upper QW in OW of victim that was scrubbed. 
C_SYNDROME_0[7:0} Syndrome for lower QW in OW of victim that was scrubbed. 
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Table 5-26 Cbox Read IPR Fields Description (Continued) 








Name Description 
Sta Bits Error Status 
00000 Either no error, or error on a speculative load, or 
a Bcache victim read due to a Dcache/Bcache miss 
00001 BC_PERR (Bcache tag parity error) 
00010 DC_PERR (duplicate tag parity error) 
00011 DSTREAM_MEM_ERR 
00100 DSTREAM_BC_ERR 
00101 DSTREAM_DC_ERR 
OO11X PROBE_BC_ERR 
01000 Reserved 
01001 Reserved 
01010 Reserved 
01011 ISTREAM_MEM_ERR 
01100 ISTREAM_BC_ERR 
01101 Reserved 
O111X Reserved 
10011 DSTREAM_MEM_DBL 
10100 DSTREAM_BC_DBL 
11011 ISTREAM_MEM_DBL 
11100 ISTREAM_BC_DBL 
C_STS[3:0] If C_STAT equals xxx_MEM_ERR or xxx_BC_ERR, then C_STS contains the 


C_ADDR[6:42] 


status of the block as follows; otherwise, the value of C_STS is X: 


Bit Value 
7:4 
3 


2 
] 
0 


Status of Block 
Reserved 

Parity 

Valid 

Dirty 

Shared 


Address of last reported ECC or parity error. If C_STAT value is 
DSTREAM_DC_ERR, only bits 6:19 are valid. 


5-42 Internal Processor Registers 
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Privileged Architecture Library Code 





This chapter describes the 21264A privileged architecture library code (PALcode). The 
chapter is organized as follows: 


e PALcode description 

e PALmode environment 

e Required PALcode function codes 

© Opcodes reserved for PALcode 

¢ Internal processor register access mechanisms 
e PALshadow registers 

¢ PALcode emulation of FPCR 

e PALcode entry points 

e Translation buffer fill flows 


e Performance counter support 


6.1 PALcode Description 


PALcode is macrocode that provides an architecturally-defined, operating-system-spe- 
cific programming interface that is common across all Alpha microprocessors. The 
actual implementation of PALcode differs for each operating system. PALcode runs 
with privileges enabled, instruction stream (Istream) mapping disabled, and interrupts 
disabled. PALcode has privilege to use five special opcodes that allow functions such as 
physical data stream (Dstream) references and internal processor register (IPR) manip- 
ulation. 


PALcode can be invoked by the following events: 
e =6Reset 

e System hardware exceptions (MCHK, ARITH) 
e Memory-management exceptions 

e Interrupts 

© CALL_PAL instructions 


PALcode has characteristics that make it appear to be a combination of microcode, 
ROM BIOS, and system service routines, though the analogy to any of these other 
items is not exact. PALcode exists for several major reasons: 
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There are some necessary support functions that are too complex to implement 
directly in a processor chip’s hardware, but that cannot be handled by a normal 
operating system software routine. Routines to fill the translation buffer (TB), 
acknowledge interrupts, and dispatch exceptions are some examples. In some archi- 
tectures, these functions are handled by microcode, but the Alpha architecture is 
careful not to mandate the use of microcode so as to allow reasonable chip imple- 
mentations. 


There are functions that must run atomically, yet involve long sequences of instruc- 
tions that may need complete access to all of the underlying computer hardware. 
An example of this is the sequence that returns from an exception or interrupt. 


There are some instructions that are necessary for backward compatibility or ease 
of programming; however, these are not used often enough to dedicate them to 
hardware, or are so complex that they would jeopardize the overall performance of 
the computer. For example, an instruction that does a VAX style interlocked mem- 
ory access might be familiar to someone used to programming on a CISC machine, 
but is not included in the Alpha architecture. Another example is the emulation of 
an instruction that has no direct hardware support in a particular chip implementa- 
tion. 


In each of these cases, PALcode routines are used to provide the function. The routines 
are nothing more than programs invoked at specified times, and read in as Istream code 
in the same way that all other Alpha code is read. Once invoked, however, PALcode 
runs in a special mode called PALmode. . 


6.2 PALmode Environment 


PALcode runs in a special environment called PALmode, defined as follows: 


Istream memory mapping is disabled. Because the PALcode is used to implement 
translation buffer fill routines, Istream mapping clearly cannot be enabled. Dstream 
mapping is still enabled. 

The program has privileged access to all of the computer hardware. Most of the 
functions handled by PALcode are privileged and need control of the lowest. 
levels of the system. 


Interrupts are disabled. If a long sequence of instructions need to be executed 
atomically, interrupts cannot be allowed. 


An important aspect of PALcode is that it uses normal Alpha instructions for most of its 
operations; that is, the same instruction set that nonprivileged Alpha programmers use. 
There are a few extra instructions that are only available in PALmode, and will cause a 
dispatch to the OPCDEC PALcode entry point if attempted while not in PALmode. The 
Alpha architecture allows some flexibility in what these special PALmode instructions do. | 
In the 21264A, the special PALmode-only instructions perform the following functions: 


Read or write internal processor registers (HW_MFPR, HW_MTPR) 


Perform memory load or store operations without invoking the normal memory- 
management routines (HW_LD, HW_ST) 


Return from an exception or interrupt (HW_RET) 
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When executing in PALmode, there are certain restrictions for using the privileged 
instructions because PALmode gives the programmer complete access to many of the 
internal details of the 21264A. Refer to Section 6.4 for information on these special 
PALmode instructions. 


Caution: It is possible to cause unintended side effects by writing what appears to be 
perfectly acceptable PALcode. As such, PALcode is not something that 
many users will want to change. Before writing PALcode, at least become 
familiar with the information in Appendix D. 


6.3 Required PALcode Function Codes 


Table 6—1 lists opcodes required for all Alpha implementations. The notation used is 
oo.ffff, where oo is the hexadecimal 6-bit opcode and ffff is the hexadecimal 26-bit 
function code. 


Table 6-1 Required PALcode Function Codes 


Mnemonic Type Function Code 
DRAINA Privileged 00.0002 
HALT Privileged 00.0000 
IMB~ Unprivileged 00.0086 


6.4 Opcodes Reserved for PALcode 


Table 6—2 lists the opcodes reserved by the Alpha architecture for implementation-spe- 
cific use. These opcodes are privileged and are only available in PALmode. 


Table 6-2 Opcodes Reserved for PALcode 


Architecture 
Mnemonic Opcode Mnemonic Function 
HW_LD 1B PALIB Dstream load instruction 
HW_ST 1F PALIF Dstream store instruction 
HW_RET 1E PALIE Return from PALcode routine 
HW_MFPR 19 PAL19 Copies the value of an IPR into an integer GPR 
HW_MTPR- 1D PALID Writes the value of an integer GPR into an IPR 


These instructions generally produce an OPCDEC exception if executed while the pro- 
cessor is not in PALmode. If I. CTL[HWE] is set, these instructions can also be exe- 
cuted in kernel mode. Software that uses these instructions must adhere to the PALcode 
restrictions listed in this section. 


6.4.1 HW_LD Instruction 


PALcode uses the HW_LD instruction to access memory outside the realm of normal 
Alpha memory management and to perform special Dstream load transactions. Data 
alignment traps are disabled for the HW_LD instruction. 


Figure 6-1 shows the HW_LD instruction format. 


Compag Confidential 
21264A Revision 1.1 — Subject To Change Privileged Architecture Library Code 6-3 


Opcodes Reserved tor PALcode 


Figure 6-1 HW_LD Instruction Format 
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Table 6-3 describes the HW_LD instruction fields. 


Table 6-3 HW_LD Instruction Fields Descriptions 





Extent 


Mnemonic Value 


Description 





[31:26] 
[25:21] 
[20:16] 
(15:13] 


[12] 


[11:0] 


6.4.2 HW_ST Instruction 


OPCODE 1By¢ 
RA ss 
RB = 
TYPE 000, 


010, 


100, 
101, 


1105 


111, 


LEN 0 


DISP — 


The opcode value. 

Destination register number. 

Base register for memory address. 

Physical — The effective address for the HW_LD instruction is physical. 


Physical/Lock — The effective address for the HW_LD instruction is 
physical. It is the load lock version of the HW_LD instruction. 


Virtual/VPTE — Flags a virtual PTE fetch (LD_VPTE). Used by trap logic 
to distinguish a single TB miss from a double TB miss. Kernel mode access 
checks are performed. 


Virtual — The effective address for the HW_LD instruction is virtual. 


Virtual/WrChk — The effective address for the HW_LD instruction is 
virtual. Access checks for fault-on-read (FOR), fault-on-write (FOW), read. 
and write protection. 


Virtual/Alt — The effective address for the HW_LD instruction is virtual. 
Access checks use DTB_ALT_MODE IPR. 


Virtual/WrChk/Alt — The effective address for the HW_LD instruction is 
virtual. Access checks for FOR, FOW, read and write protection. Access 
checks use DTB_ ALT_MODE IPR. 


Access length is longword. 
Access length is quadword. 


Holds a 12-bit signed byte displacement. 


PALcode uses the HW_ST instruction to access memory outside the realm of normal 
Alpha memory management and to do special forms of Dstream store instructions. Data 
alignment traps are inhibited for HW_ST instructions. Figure 6-2 shows the HW_ST 
instruction format. 


Figure 6-2 HW_ST Instruction Format 
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Table 6-4 describes the HW_ST instruction fields. 


Table 6—4 HW_ST Instruction Fields Descriptions 








Extent Mnemonic Value Description 
{31:26} OPCODE 1Fi¢ The opcode value. 
[25:21] RA —_ Write data register number. 
{20:16} RB — Base register for memory address. 
[15:13] TYPE 000, Physical — The effective address for the HW_ST instruction is physical. 
001, Physical/Cond — The effective address for the HW_ST instruction is 
physical. Store conditional version of the HW_ST instruction. The lock 
flag is returned in RA. Refer to PALcode restrictions for correct use of this 
function. 
010, Virtual — The effective address for the HW_ST instruction is virtual. 
110, Virtual/Alt — The effective address for the HW_ST instruction is virtual. 
’ Access checks use DTB_ ALT_MODE IPR. 
All others Unused. . 
[12] LEN 0 Access length is longword. 
j Access length is quadword. 
[11:0] DISP — Holds a 12-bit signed byte displacement. 


6.4.3 HW_RET Instruction 


The HW_RET instruction is used to return instruction flow to a specified PC. The RB 
field of the HW_RET instruction specifies an integer GPR, which holds the new value 
of the PC. Bit [0] of this register provides the new value of PALmode after the 
HW_RET instruction is executed. Bits [15:14] of the instruction determine the stack 
action. 


Normally the HW_RET instruction succeeds a CALL_PAL instruction, or a trap han- 
dler that pushed its PC onto the prediction stack. In this mode, the HINT should be set 
to ‘10° to pop the PC and generate a predicted target address for the HW_RET instruc- 
tion. . 


In some conditions, the HW_RET instruction is used in the middle of a PALcode flow 
to cause a group of instructions to retire. In these cases, if the HW_RET instruction 
does not have a corresponding instruction that pushed a PC onto the stack, the HINT 
field should be set to ‘00’ to keep the stack from being modified. 


In the rare circumstance that the HW_RET instruction might be used like a JSR or 
JSR_COROUTINE, the stack can be managed by setting the HINT bits accordingly. 


See Section D.25 for more information about the HW_RET instruction. 


Figure 6-3 shows the HW_RET instruction format. 
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Figure 6-3 HW_RET Instruction Format 
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Table 6—5 describes the HW_RET instruction fields. 


Table 6-5 HW_RET Instruction Fields Descriptions 





Extent Mnemonic Value 


Description 





[31:26] OPCODE 1Ej.¢ 
[25:21] RA a 
[20:16] RB ss 


[15:14] HINT 00 


10 
11 


[13] STALL —_ 


[12:0]  DISP ae 


The opcode value. 
Register number. It should be R31. 


Target PC of the HW_RET instruction. Bit [0] of the register’s contents 
determines the new value of PALmode. 


HW_JMP — The PC is not pushed onto the prediction stack. The predicted 
target is PC + (4*DISP[12:0)). 


HW_JSR — The PC is pushed onto the prediction stack. The predicted 
target is PC + (4*DISP[12:0)). 


HW_RET — The prediction is popped off the stack and used as the target. 


i HW_COROUTINE — The prediction is popped off the stack and used as 
the target. The PC is pushed onto the stack. 


If set, the fetcher is stalled until the HW_RET instruction is retired or 
aborted. The 21264A will: 


© Force a mispredict 

¢ Kill instructions that were fetched beyond the HW_RET instruction 
© Refetch the target of the HW_RET instruction 

© Stall until the HW_RET instruction is retired or aborted 


If instructions beyond the HW_RET have been issued out of order, they 
will be killed and refetched. 


Holds a 13-bit signed longword displacement 





6.4.4 HW_MFPR and HW_MTPR Instructions 


The HW_MFPR and HW_MTDPR instructions are used to access internal processor reg- 
isters. The HW_MEFPR instruction reads the value from the specified IPR into the inte- 
ger register specified by the RA field of the instruction. The HW_MTPR instruction 
writes the value from the integer GPR, specified by the RB field of the instruction, into 
the specified IPR. Figure 6-4 shows the HW_MFPR and HW_MTPR instructions for- 


mat. 


Figure 6-4 HW_MFPR and HW_MTPR Instructions Format 
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Table 6-6 describes the HW_MFPR and HW_MTPR instructions fields. 


Table 6-6 HW_MFPR and HW_MTPR Instructions Fields Descriptions 





Extent 


[31:26] 


{25:21} 


[20:16] 


[15:8] 
[7:0] 


Mnemonic Value Description 


OPCODE 19:6 The opcode value for the HW_MFPR instruction. 


IDi¢ The opcode value for the HW_MTPR instruction. 


RA — Destination register for the HW_MFPR instruction. It should be R31 
for the HW_MTPR instruction. 

RB — Source register for the HW_MTPR instruction. It should be R31 for the 
HW_MFPR instruction. 

INDEX — IPR index. 

SCBD_MASK — Specifies which IPR scoreboard bits in the IQ are to be applied to this 


instruction. If a mask bit is set, it indicates that the corresponding IPR 
scoreboard bit should be applied to this instruction. 


6.5 Internal Processor Register Access Mechanisms 


This section describes the hardware and software access mechanisms that are used for 
the 21264s IPRs. 


Because the Ibox reorders and executes instructions speculatively, extra hardware is 
required to provide software with the correct view of the architecturally-defined state. 
The Alpha architecture defines two classes of state: general-purpose registers and 
memory. Register renaming is used to provide architecturally-correct register file 
behavior. The Ibox and Mbox each have dedicated hardware that provides correct mem- 
ory behavior to the programmer. Because the internal processor registers are implemen- 
tation-specific, and their state is not defined by the Alpha architecture, access 
mechanisms for these registers may be defined that impose restrictions and limitations 
on the software that uses them. 


For every IPR, each instruction type can be classified by how it affects and is affected 
by the value held by that IPR. 


e Explicit readers are HW_MFPR instructions that explicitly read the value of the 
IPR. 


¢ Implicit readers are instructions whose behavior is affected by the value of the IPR. 
For example, each load instruction is an implicit reader of the DTB. 


¢ Explicit writers are HW_MTPR instructions that explicitly write a value into the 
IPR. 


¢ Implicit writers are instructions that may write a value into the IPR as a side effect 
of execution. For example, a load instruction that generates an access violation is 
"an implicit writer of the VA, MM_STAT, and EXC_ADDR IPRs. In the 21264A, 
only instructions that generate an exception will act as implicit IPR writers. 


Only certain IPRs, such as those with write-one-to-clear bits, are both implicitly and 
explicitly written. The read-write semantics of these IPRs is controlled by software. 
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6.5.1 IPR Scoreboard Bits 


In previous Alpha implementations, IPR registers were not scoreboarded in hardware. 
Software was required to schedule HW_MTPR and HW_MEPR instructions for each 
machine’s pipeline organization in order to ensure correct behavior. This software 
scheduling task is more difficult in the 21264A because the Ibox performs dynamic 
scheduling. Hence, eight extra scoreboard bits are used within the IQ to help maintain 
correct IPR access order. The HW_MTPR and HW_MFPR instruction formats contain 
an 8-bit field that is used as an IPR scoreboard bit mask to specify which of the eight 
IPR scoreboard bits are to be applied to the instruction. 


If any of the unmasked scoreboard bits are set when an instruction is about to enter the 
IQ, then the instruction, and those behind it, are stalled outside the IQ until all the 
unmasked scoreboard bits are clear and the queue does not contain any implicit or 
explicit readers that were dependent on those bits when they entered the queue. When 
all the unmasked scoreboard bits are clear, and the queue does not contain any of those 
readers, the instruction enters the IQ and the unmasked scoreboard bits are set. 


HW_MFPR instructions are stalled in the IQ until all their unmasked IPR scoreboard 
bits are clear. 


When scoreboard bits [3:0] and [7:4] are set, their effect on other instructions is differ- 
ent, and they are cleared in a different manner. — 


If any of scoreboard bits [3:0] are set when a load or store instruction enters the IQ, that 
load or store instruction will not be issued from the IQ until those scoreboard bits are 
clear. 


Scoreboard bits [3:0] are cleared when the HW_MTPR instructions that set them are 
issued (or are aborted). Bits [7:4] are cleared when the HW_MTPR instructions that set 
them are retired (or are aborted). 


Bits [3:0] are used for the DTB_TAG and DTB_PTE register pairs within the DTB fill 
flows. These bits can be used to order writes to the DTB for load and store instructions. 
See Sections 5.3.1 and 6.9.1. 


Bit [0] is used in both DTB and ITB fill flows to trigger, in hardware, a light-weight 
memory barrier (TB-MB) to be inserted between a LD_VPTE and the corresponding 
virtual-mode load instruction that missed in the TB. 


6.5.2 Hardware Structure of Explicitly Written IPRs 


IPRs that are written by software are physically implemented as two registers. When 
the HW_MTPR instruction that writes the IPR executes, it writes its value to the first 
register. When the HW_MTPR instruction is retired, the contents of the first register are 
written into the second register. Instructions that either implicitly or explicitly read the 
value of the IPR access the second register. Read-after-write and write-after-write 
dependencies are managed using the IPR scoreboard bits. To avoid wnite-after-read 
conflicts, the second register is not written until the writer is retired. The writer will not 
be retired until the previous reader is retired, and the reader is retired after it has read its 
value from the second register. 


Some groups of IPRs are built using a single shared first register. To prevent write- 
after-write conflicts, IPRs that share a first register also share scoreboard bits. 
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6.5.3 Hardware Structure of Implicitly Written IPRs 


Implicitly written IPRs are physically built using only a single level of register, how- 
ever the IPR has two hardware states associated with it: 


1. Default State—The contents of the register may be written when an instruction gen- 
erates an exception. If an exception occurs, write a new value into the IPR and go to 
state 2. 


2. Locked State—The contents of the register may only be overwritten by an except- 
ing instruction that is older than the instruction associated with the contents of the 
IPR. If such an exception occurs, overwrite the value of the IPR. When the trigger- 
ing instruction, or instruction that is older than the triggering instruction, is killed 
by the Ibox, go to state 1. 


6.5.4 IPR Access Ordering 


IPR access mechanisms must allow values to be passed through each IPR from a pro- 
ducer to its intended consumers. 


Table 6-7 lists all of the paired instruction orderings between instructions of the four 
IPR access types. It specifies whether access order must be maintained, and if so, the 
mechanisms used to ensure correct ordering. 


Table 6-7 Paired Instruction Fetch Order 


Second 
Instruction 


ea Implicit Reader | Implicit Writer Explicit Reader Explicit Writer 


Read transac- |No IPRs in this class. | Read transactions can | A variety of mechanisms are 

tions can be be reordered. used to ensure order: 

reordered. scoreboard bits to stall issue of 
reader; HW_RET_STALL to 
stall reader; double write plus 
buffer blocks to force retire and 
allow for propagation delay. 


No IPRs in this | The hardware struc- |IPR-specific PALcode | No IPRs in this class. 
class. ture of implicitly restrictions are 
written IPRs handles | required for this case. 
this case. An interlock mecha- 
nism must be placed 
between the explicit 
reader and the implicit 
writer (a read transac- 
tion). 


Read transac- If the reader is inthe | Read transactions can {Scoreboard bits stall issue of 
tions can be PALcode routine be reordered. reader until writer is retired. 
reordered. invoked by the 

exception associated 

with the writer, then 

ordering is guaran- 

teed. 
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Table 6-7 Paired Instruction Fetch Order (Continued) 


Second 
Instruction 


Reader reads Write-one-to-clear Reader reads second | Scoreboard bits stall second 
second register. | bits, or performance |register. Writer cannot | writer in map stage until first 
Writer cannot {counter special case. | write second register _| writer is retired. 


write second For example, perfor- | until it is retired. 
register until it | mance counter incre- 
is retired. ments are typically . 

not scoreboarded 

against read transac- 

tions. 





For convenience of implemenation, there is no IPR scoreboard bit checking within the 
same fetch block (octaword-aligned octaword). 


¢ Within one fetch block, there can be only one explicit writer (HW_MTPR) to an 
IPR in a particular scoreboard group. 


¢ Within one fetch block, an explicit writer (HW_MTPR) to an IPR in a particular 
scoreboard group cannot be followed by an explicit reader (HW_MFPR) to an IPR 
in that same scoreboard group. 


¢ Within one fetch block, an explicit writer (HW_MTPR) to an IPR in a particular 
scoreboard group cannot be followed by an implicit reader to an IPR in that score- 
board group. This case covers writes to DTB_PTE or DTB_TAG followed by a LD, 
ST, or any memory operation, including HW_RETs without the ‘stall’ bit set. 


6.5.5 Correct Ordering of Explicit Writers Followed by Implicit Readers 


Across fetch blocks, the correct ordering of the explicit write of the DITB_PTE or 
DTB_TAG followed by an implicit reader (memory operation) is guaranteed using the 
IPR scoreboard bits. 


However, there are cases where correct ordering of explicit writers followed by implicit 
readers cannot be guaranteed using the IPR scoreboard mechanism. If the instruction 
that implicitly reads the IPR does so before the issue stage of the pipeline, the score- 
board mechanism is not sufficient. 


For example, modification of the ITB affects instructions before the issue state of the 
pipeline. In this case, PALcode must contain a HW_RET instruction, with its stall bit 
set, before any instruction that implicitly reads the IPR(s) in question. This prevents 
instructions that are newer than the HW_RET instruction from being successfully 
fetched, issued, and retired until after the HW_RET instruction is retired (or aborted). 


There are also cases when the HW_RET with the STALL bit mechanism is not suffi- 
cient. There may be additional! propagation delay past the retirement of the HW_RET 
instruction. In these cases, instead of using a HW_RET, a suggested method of ensur- 
ing the ordering is coding a group of 5 fetch blocks, where the first contains the 
HW_MTBPR to the IPR, the second contains a HW_MTPR to the same IPR or one in the 
same scoreboard group, and where the following 3 fetch blocks each contain at least 
one non-NOP instruction. See Appendix D for a listing of cases where this method is 
recommended. 
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6.5.6 Correct Ordering of Explicit Readers Followed by Implicit Writers 


Certain IPRs that are updated as a result of faulting memory operations require PAL- 
code assistance to maintain ordering against newer instructions. Consider the following 
code sequence: 


HW_MFPR IPR_MM_STAT 
LDQ rx, (ry) 
It is typically the case that these instructions would issue in-order: 


e The MFPR is data-ready and both instructions use a lower subcluster. However, the 
HW_MFPRs (and HW_MTPRs) respond to certain resource-busy indications and 
do not issue when the MBOX informs the IBOX that a certain set of resources 
(store bubbles) are busy. 


e The LDs respond to a different set of resource-busy indications (load-bubbles) and 
could issue around the HW_MFPR in the presence of the former. PALcode assis- 
tance is required to enforce the issue order. 


One totally reliable method is to insert an MB (memory barrier) instruction before the 
first load that occurs after the HW_MFPR MM_STAT. Another method would be to 
force a register dependency between the HW_MFPR and the LD. 


6.6 PALshadow Registers 


The 21264A contains eight extra virtual integer registers, called shadow registers, 
which are available to PALcode for use as scratch space and storage for commonly used 
values. These registers are made available under the control of the SDE[1] field of the 
I_CTL IPR. These shadow registers overlay R4 through R7 and R20 through R23, 
when the CPU is in PALmode and SDE[1] is set. 


PALcode generally runs with shadow mode enabled. Any PALcode that supports 
CALL_PAL instructions must run in that mode because the hardware writes a 
PALshadow register with the return address of CALL_PAL instructions. 


PALcode may occasionally be required to toggle shadow mode to obtain access to the 
overlayed registers. See the PALcode restriction, Updating I_CTL[SDE], in Section 
D.32. 


6.7 PALcode Emulation of the FPCR 


The FPCR register contains status and control bits. They are accessed by way of the 
MT_FPCR and MF_FPCR instructions. The register is physically implemented like an 
explicitly written IPR. It may be written with a value from the floating-point register 
file by way of the MT_FPCR instruction. Architecturally-compliant FPCR behavior 
requires PALcode assistance. The FPCR register must operate as listed here: 


1. Correct operation of the status bits, which must be set when a floating-point 
instruction encounters an exceptional condition, independent of whether a trap for 
the condition is enabled. 


2. Correct values must be returned when the FPCR is read by way of aMF_FPCR 
instruction. 
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3. Correct actions must occur when the FPCR is written by way of aMT_FPCR 
instruction. 


6.7.1 Status Flags 


The FPCR status bits in the 21264A are set with PALcode assistance. Floating-point 
exceptions, for which the associated FPCR status bit is clear or for which the associated 
trap is enabled, result in a hardware trap to the ARITH PALcode routine. The 
EXC_SUM register contains information to allow this routine to update the FPCR 
appropriately, and to decide whether to report the exception to the operating system. 


6.7.2 MF_FPCR 


The MF_FPCR is issued from the floating-point queue and executed by the Fbox. No 
PALcode assistance is required. 


6.7.3 MT_FPCR 


The MT_FPCR instruction is issued from the floating-point queue. This instruction is 
implemented as an explicit IPR write operation. The value is written into the first latch, 
and when the instruction is retired, the value is written into the second latch. There is no 
IPR scoreboarding mechanism in the floating-point queue, so PALcode assistance is 
required to ensure that subsequent readers of the FPCR get the updated value. 


After writing the first latch, the MT_FPCR instruction invokes a synchronous trap to 
the MT_FPCR PALcode entry point. The PALcode can return using a HW_RET 
instruction with its STALL bit set. This sequence ensures that the MT_FPCR instruc- 
tion will be correctly ordered for subsequent readers of the FPCR. 


6.8 PALcode Entry Points 


PALcode is invoked at specific entry points, of which there are two classes: 
CALL_PAL and exceptions. 


6.8.1 CALL_PAL Entry Points 


CALL_PAL entry points are used whenever the Ibox encounters a CALL_PAL instruc- 
tion in the Istream. To speed the processing of CALL_PAL instructions, CALL_PAL 
instructions do not invoke pipeline aborts but are processed as normal jumps to the off- 
set from the contents of the PAL_BASE register, which is specified by the CALL_PAL 
instruction’s function field. 


The Ibox fetches a CALL_PAL instruction, bubbles one cycle, and then fetches the 
instructions at the CALL_PAL entry point. For convenience of implementation, returns 
from CALL_PAL are aided by a linkage register (much like JSRs). PALshadow regis- 
ter R23 is used as the linkage register. The Ibox loads the PC of the instruction after the 
CALL_PAL instruction, into the linkage register. Bit [0] of the linkage register is set if 
the CALL_PAL instruction was executed while the processor was in PALmode. 


The Ibox pushes the value of the return PC onto the return prediction stack. 
CALL_PAL instructions start at the following offsets: 


¢ Privileged CALL_PAL instructions start at offset 2000; ¢. 
¢ Nonprivileged CALL_PAL instructions start at offset 3000) ¢. 


Compaq Confidential 
6-12 Privileged Architecture Library Code 21264A Revision 1.1 — Subject To Change 


PALcode Entry Points 


Each CALL_PAL instruction includes a function field that is used to calculate the PC of 
its associated PALcode entry point. The PALcode OPCDEC exception flow will be 
invoked if the CALL_PAL function field satisfies any of the following requirements: 


Is in the range of 40;¢ to 7F)¢ inclusive 
Is greater than BF }¢ 


Is between 00;¢ and 3Fj¢ inclusive, and IER_CM[CM] is not equal to the kernel 
mode value 0 


If none of the conditions above are met, the PALcode entry point PC is as follows: 


PC[63:15] = PAL_BASE[63:15] 

PC[14] =0 

PC[13] = 1 

PC[12] = CALL_PAL function field [7] 
PC[11:6] = CALL_PAL function field [5:0] 
PC[5:1] =0 

PC[O] = 1 (PALmode) 


6.8.2 PALcode Exception Entry Points 


When hardware encounters an exception, [box execution jumps to a PALcode entry 
point at a PC determined by the type of exception. The return PC of the instruction that 
triggered the exception is placed in the EXC_ADDR register and onto the return predic- 
tion stack. 


Table 6-8 shows the PALcode exception entry locations and their offset from the 
PAL_BASE IPR. The entry points are listed in decreasing order of priority. 


Table 6-8 PALcode Exception Entry Locations 


Entry Name 
DTBM_DOUBLE_3 


DTBM_DOUBLE_4 


FEN 

UNALIGN 
DTBM_SINGLE 
DFAULT 
OPCDEC 


IACV 


Type Offset;, Description 

Fault 100 Dstream TB miss on virtual page table entry fetch. Use three- 
level flow. 

Fault 180 Dstream TB miss on virtual page table entry fetch. Use four- 
level flow. 

Fault 200 Floating point disabled. 

Fault 280 Unaligned Dstream reference. 

Fault 300 Dstream TB miss. 

Fault 380 Dstream fault or virtual address sign check error. 

Fault 400 Illegal opcode or function field: 


¢ Opcode 1, 2, 3, 4, 5, 6 or 7 

¢ Opcode 1916, 1By¢, 1Dj¢, 1Ey6 or 1Fy6 , not PALmode or 
not I_CTL[HWE] 

¢ Extended precision IEEE format 

¢ Unimplemented function field of opcodes 14,6 or 1Ci¢ 


Fault 480 Istream access violation or virtual address sign check error. 
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Table 6-8 PALcode Exception Entry Locations (Continued) 








Entry Name Type Offset;, Description 

MCHK Interrupt 500 Machine check. 

ITB_MISS Fault 580 Istream TB miss. 

ARITH Synch. Trap 600 Arithmetic exception or update to FPCR. 
INTERRUPT Interrupt 680 Interrupts: hardware, software, and AST. 
MT_FPCR Synch. Trap 700 Invoked when a MT_FPCR instruction is issued. 
RESET/WAKEUP Interrupt 780 Chip reset or wake-up from sleep mode. 


6.9 Translation Buffer (TB) Fill Flows 


This section shows the expected PALcode flows for DTB miss and ITB miss. Familiar- 
ity with 21264A IPRs is assumed. 


6.9.1 DTB Fill 


Figure 6-5 shows single-miss DTB instructions flow. 


Figure 6-5 Single-Miss DTB Instructions Flow Example 


hw_mfpr ¥23, EV6__EXC_ADDR ; (OL) get exception address 
hw_mfpr r4, EV6__VA_FORM ; (4-7,1L) get vpte address 
hw_mfpr £5, EV6__MM STAT ; (OL) get miss info 
hw_mfpr rs; EV6__EXC_SUM ; (OL) get exc_sum for ra 
trap__dtbm_single_vpte: 
hw_ldq/v r4, (x4) ; (1L) get vpte 
bltp_misc, trap__ditol ; (xU) [63]=1 => 1-to-i 
hw_mfpr r6, EV6__VA ; (4-7,1L) get original va 
blbcr4, trap__invalid_dpte ; (xU) invalid => branch 
srlr4, #7, x7 ; get mb bit 
hw_mtpr 16; EV6__DTB_TAGO ; (2&6,0L) write tag0d 
hw_mtpr r6, EV6__DTB_TAG1 ; (1&5,1L) write tagl 
hw_mtpr r4, EV6__DTB_PTEO ; (0&4,0L) write pted 
hw_mtpr r4, EV6__DTB_PTE1 ; (3&7,1L) write ptel 


ASSUME <tb_mb_en + pte_eco> ne 2 
.if ne pte_eco 
blber7, trap__dtbhm_single_mb; branch for mb 


hw_ret (r23) ; return 


trap__dthbm_single_mb: 
mb 
.endc 
hw_ret (r23) ; (OL) return 
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The following list presents information about the single-miss DTB code example: 


In Figure 6—5, where (x,y) or (y) appear in the comments, x specifies the scoreboard 
bits and y specifies the Ebox subcluster. 


r4 —r7 and r20 — r23 are PALshadow registers. 


PALshadow 122 contains a flag that indicates whether the native code is running 
“j—to—1”, that is, running in a mode where the physical address should be mapped 
1—to—1 to the virtual address, rather than being taken from a page table. 


IPR scoreboard bits [3:0] are used to order the restarted load or store instructions 
for the DTB write transactions. 


MM_STAT and VA will not be overwritten if the LD_VPTE instruction misses the 
DTB. There is no issue order constraint. 


The code is written to prevent a later execution of the DTB fill instruction from 
being issued before a previous execution and corrupting the previous write to the 
TB registers. The correct sequence of executions is accomplished by placing code 
dependencies on scoreboard bits [7:4] in the path of the successive writers. This 
prevents the successive writers from being issued before the previous writers are 
retired. 


When I_CTL[TB_MB_EN] = 1, the issue of MTPR DTB_PTEO triggers, in hard- 
ware, a light-weight memory barrier (TB-MB). The light-weight memory barrier 


enforces read-ordering of store instructions from another processor (I) to this pro- 


cessor’s (J) page table and this processor’s virtual memory area such that if this 
processor sees the write to the PTE from (J) it will see the new data. 


Processor | Processor J 

Wr Data LD/ST 

MB <tb miss> 

Wr PTE LD-PTE, write TB 
LD/ST 


The conditional branch is placed in the code so that all of the MTPR instructions 
are issued and retired or none of them are issued and retired. This allows the TB fill 
hardware to update the TB whenever it sees the retiring of PTE! and to ignore 
writes to TAGO/TAG1/PTEO/PTE! in the interim between the issuing of those 
writes and a retire of PTE1. 


As an alternative to using I CTL[TB_MB_EN] = 1 to enforce read ordering, 
I_CTL[TB_MB_EN] can be set to 0 and the PALcode may use a bit in the PTE to 
indicate whether to do an explicit MB. The flow example in Figure 6—5 assumes 
this alternative. 


The value in DTB_PTEx[GH] determines whether the scoreboard mechanism alone 
is sufficient to guarantee all subsequent load/store instructions (implicit readers of 
the DTB) are ordered relative to the creation of a new DTB entry; whether all sub- 
sequent loads and stores to the loaded address will hit in the DTB. 


— If DTB_PTEx[GH] is zero, the scoreboard mechanism alone is sufficient. 
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6.9.2 ITB Fill 


Figure 6-6 shows the ITB miss instructions flow. 


If DTB_PTEx[GH] is not zero, the scoreboard mechanism alone is not suffi- 
cient (although this is not a problem). In this case, the new DTB entry is not 
visible to subsequent load/store instructions until after the MTPR DTB_PTE]1 
retires. 


Issuing a HW_RET_STALL instead of a HW_RET would guarantee ordering, 
but is not necessary. Code executes correctly without the stall although execu- 
tion might result in two passes through the DTB miss flow, rather than one, 
because the re-execution of the memory operation after the first DTB miss 
might miss again. 


This behavior is functionally correct because DTB loads that tag-match an 
existing DTB entry are ignored by the 21264A and the second DTB miss exe- 
cution will load exactly the same entry as the first. 


Figure 6-6 ITB Miss Instructions Flow Example 


hw_mfpr r4, EV6__ITVA_FORM ; (OL) get vpte address 

hw_mfpr r23, EV6__EXC_ADDR ; (OL) get exception address 

lda r6, “xOFFF (r31) ; (XU) create mask for prot 

bis r3t,. 231, ‘ral ; (XU) £111 out fetch block 
trap__itb miss_vpte: 

hw_ldq/v r4, (r4) ; (XL) | get vpte 

and ra. 716; x5 ; (xL) get prot bits 

bltp_misc, trap__iltol ; (xU) 1-to-1 => branch 

srl r4, #0SF_PTE_ PFN__S, r6 ; (xU) shift PFN to <0> 

sll r6, #EV6__ITB PTE _PFN_S, r6 ; (xU) shift PFN into place 

and r4,  #<1@OSF_PTE_FOE_S>, r7 ; (xL) get FOE bit 

blbec r4, trap__invalid_ipte ; (xU) invalid => branch 

bne x7; trap__ foe ; (xU) FOE => branch 

srl r4, #7, x7 ; check for mb bit 

bis r5; r6, r6 ; (xL) PTE in ITB format 

hw_mtpr 423; EV6__ITB_ TAG ; (6,0L) write tag 

hw_mtpr r6, EV6__ITB PTE ; (0&4,0L) write PTE 


ASSUME <tb_mb_en + pte_eco> ne 2 

.if ne pte_eco 

r7, trap__itb_miss_mb ; branch for mb 
hw_ret_stall (r23); (OL) 

trap__itb_miss_mb: 


blbec 


mb 
.endc 


hw_ret_stall (r23) 7 (OL) 
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The following list presents information about the ITB miss flow code example: 


e In Figure 6-6, where (x,y) or (y) appear in the comments, x specifies the scoreboard 
bits and y specifies the Ebox subcluster. 


¢ The ITB is only accessed on Icache misses. 
e r4-r7 and r20 — 123 are PALshadow registers. 


e PALshadow r22 contains a flag that indicates whether the native code is running 
““1-to—1”, that is, running in a mode where the physical address should be mapped 
1-to—1 to the virtual address, rather than being taken from a page table. 


© The HW_RET instruction should have its STALL bit set to ensure that the restarted 
Istream does not read the ITB until the ITB is written. 


As an alternative to using I]_CTL[TB_MB_EN] = 1 to enforce read ordering, 
I_CTL{TB_MB_EN] can be set to 0 and the PALcode may use a bit in the PTE to 
indicate whether to do an explicit MB. The flow example in Figure 6-6 assumes 
this alternative. 


6.10 Performance Counter Support 


The 21264A provides hardware support for two methods of obtaining program perfor- 
mance feedback information. The two methods do not require program modification. 
Instead, performance monitoring utilities make calls to the PALcode to set up the 
counters and contain interrupt handlers that call PALcode to retrieve the collected data. 
The first method, Aggregate mode, offers capabilities that are similar to earlier micro- 
processor performance counters. This mode counts events when enabled, until it over- 
flows, causing an interrupt that can retrieve the collected data. The second method, 
ProfileMe mode, supports a new way of statistically sampling individual instructions 
during program execution. This mode counts events triggered by a targeted inflight 
instruction. 


Counter support uses the hardware registers listed in Table 6-9. 


Table 6-9 IPRs Used for Performance Counter Support 


Register Name Mnemonic Relevant Fields Described in Section 

ProfileMe PC PMPC All fields 5.2.6 

Interrupt enable and current proces- JER_CM PCEN[1:0] 5.2.9 

sor mode 

Interrupt summary ISUM PC[1:0} 3.2414 

Ibox control L_CTL SPCE, PCTO_EN, PCT1_EN 5.2.15 

Tbox status STAT OVR, ICM, TRAP-TYPE, 5.2.16 

LSO, TRP, MIS 

Ibox process context PCTX PPCE . 5.2.21 

Performance counter support PCTR_CTL All fields S222 
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6.10.1 General Precautions 


Initialize both counters, (PCTR_CTL[PCTRO and PCTR1]), to zero in reset PALcode 
to avoid spurious interrupts when exiting initial PALcode. Counters must be written 
twice during initialization to ensure that the overflow latch has been cleared (see the 
PALcode restrictions in Sections D.28 and D.34). 


The counters should never be left within one cycle of overflow when disabled because 
that can cause some interrupts to be blocked in anticipation of an overflow interrupt 
(see PALcode restriction 32). 


If a counter is at the overflow threshold and a value is written to that counter, the 
counter signals an overflow interrupt upon leaving PALmode, even if that counter is 
disabled. To avoid that interrupt, the PALcode should clear the interrupt by writing to 
HW_INT_CLR. 


Interrupts are disabled in PALmode. 


As a quirk of the implementation, while counting is disabled, a read of PCTR_CTL can 
yield value+some increment, where value is the actual value in PCTR_CTL, and incre- 
ment for PCTRO is in the range 0..4 (retired instructions in that cycle), and increment 
for PCTR1 is dependent on SL1. 


6.10.2 Aggregate Mode Programming Guidelines 


Use the following information to program counters in Aggregate mode. 


6.10.2.1 Aggregate Mode Precautions 


Counters continue to count after overflow. 

Only the counters return useful data. See Table 6-11 for counting modes. 

Counters can be read by a PALcode instruction at any time to get the aggregate count. 
The legal range for PCTRO when writing the IPR is 0:(2**20-16). 

The legal range for PCTR1 when writing the IPR is 0:(2**20-4). 


6.10.2.2 Operation 


IPR Name 
IER_CM 
PCTX 
PCTR_CTL 


1. Setup 
The following IPRs need to be set up by PALcode instructions. 


Relevant Fields Meaning 


PCEN{1:0] Enable Interrupts. 

PPCE Enable Process Performance Counting or use I_CTL[SPCE]. 

SLO Selects Aggregate or ProfileMe mode; set to 0 for Aggregate mode. 

SLI Selects PCTRO and PCTR1 counting modes. See Table 6-11 for more infor- 
mation. . 


PCTRO[19:0] Set counter 0 starting value [0:(2**20-16)]. See Section 6.10.1 for setup 
precautions. 


PCTRI1{[19:0] Set counter | starting value [0:(2**20-4)]. See Section 6.10.1 for setup pre- 
cautions. 
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ei a pct ee 
IPR Name Relevant Fields Meaning 








LCTL SPCE Enable System Performance Counting or use PCTX[PPCE]}. 
PCTO_EN Enable performance counter 0. 
PCT1_EN Enable performance counter 1. 
2. Count 
If PCTRO and PCTR1 are enabled, will increment according to modes selected by 
SLO and SL1. 
3. Overflow 


If PCEN[1:0] is enabled, PC[1:0] is set when PCTRO or PCTR1 overflows. 
4. Hardware interrrupt 


When PC[1:0] is set, the PALcode interrupt routine is entered. Interrupt is acknowl- 
edged and PALcode generates an interrupt to the operating system performance 
monitoring utility. 


5. Operating system interrupt handler 


The handler should read the IPR PCTR_CTL, as shown in Table 6-10, to note 
which counter overflowed in the handler's data structures. The handler may read the 
counter to see how many events have happened since the overflow. 


The handler may also choose to write the counters to control the frequency of inter- 
rupts. 


Table 6-10 Aggregate Mode Returned IPR Contents 
IPR Field Contents 
PCTR_CTL PCTRO[19:0] Counter #0 value 

PCTR1[19:0] Counter #1 value 


6.10.2.3 Aggregate Counting Mode Description 
6.10.2.3.1 Cycle counting 
Counts cycles. 
PCTRO is incremented by the number of cycles counted, that is, 1. 
6.10.2.3.2 Retired instructions cycles 


PCTRO is incremented by up to 8 retired instructions per cycle when enabled via 
I_CTL[PCTO_EN] and either I_CTL[SPCE] or PCTX[PPCE]. On overflow, an inter- 
rupt is triggered as ISUM[PCO] if enabled via IER_CM[PCENO]. 


The 21264A can retire up to 11 instructions per cycle, which exceeds PCTRO's maxi- 
mum increment of 8 per cycle. However, no retires go uncounted because the 21264A 
cannot sustain 11 retires per cycle, and the 21264A corrects PCTRO in subsequent 
cycles. 


A squashed instruction does not count as a retire. 
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6.10.2.3.3 Bcache miss or long latency probes cycles 
This input counts the number of times the Bcache result was a miss. 


Essentially, a long latency probe is a data request from other processes that cause 
Bcache misses in a system. 


This count is phase shifted three cycles early and thus includes events that occurred 
three cycles before the start and before the end of the ProfileMe window. 


6.10.2.3.4 Mbox replay traps cycles 
This input counts Mbox replay traps. 


6.10.2.4 Counter Modes for Aggregate Mode 


Table 6—11 shows the counter modes that are used with Aggregate mode. 


Table 6-11 Aggregate Mode Performance Counter IPR Input Select Fields 


SLO[4] SL1[3:2] PCTRO PCTR1 

0 00 Retired instructions Cycle counting 

0 01 Cycle counting Not defined 

0 10 Retired instructions Bcache miss or long latency probes 
0 1] Cycle counting Mbox replay traps 


6.10.3 ProfileMe Mode Programming Guidelines 
Use the following information to program counters in ProfileMe mode. 
6.10.3.1 ProfileMe Mode Precautions 
Squashed NOPs count as valid fetched instructions. 
Counter 1 must be explicitly cleared in the trap handler before each data collection. 


The CMOV instruction is decomposed into two valid fetched instructions that, in the 
absence of stalls, are fetched in consecutive cycles. See Table 6—12 for more informa- 
tion. 


Table 6-12 CMOV Decomposed 


instruction New Instructions 








CMOV Ra, Rb--> Rc CMOVI1 Ra, oldRc —-> newRcl 
CMOV? newRc!l, Rb —> newRc2 





6.10.3.2 Operation 
1. Setup 
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The following IPRs need to be set up by using PALcode instructions. 





IPR Name Relevant Fields Meaning 





TER_CM PCEN[1:0] Enable Interrupts. 
PCTX PPCE Enable Process Performance Counting or use I_CTL[SPCE]. 
PCTR_CTL SLO Selects Aggregate or ProfileMe mode; set to 1 for ProfileMe mode. 
SL1 Selects PCTRO and PCTR1 counting modes. See Table 6—14 for more infor- 
mation. 


PCTRO[19:0] Set counter 0 value (2**20-N). This selects approximately the Nth valid 
fetched instruction as the profiled instruction. Because writes to PCTRO are 
incremented by 0..4, the profiled instruction is one of the (N-4)th to Nth valid 
fetched instructions. See Section 6.10.1 for more setup precautions. 


PCTRI[19:0] Set counter 1 value = 0. See Section 6.10.1 for more setup precautions. 
L_CTL SPCE Enable System Performance Counting or use PCTX[PPCE]. 

PCTO_EN Enable performance counter 0. 

PCT1_EN Enable performance counter 1. 


2. Open window 


PCTRO accumulates up to 4 valid fetched instructions per cycle when enabled via 
I_CTL[PCTO_EN] and either I_CTL[SPCE] or PCTX[PPCE]. 


The valid fetched instruction that causes PCTRO to overflow opens the window and 
becomes the profiled instruction and covers a period of time near to when the 
instruction was in flight. The first cycle of the window is the 5th cycle after the 
instruction was fetched. A residual count of up to 7 valid fetched instructions is 
accumulated in PCTRO in the two cycles between overflow and the start of the Pro- 
fileMe window. This residual count is returned in I STAT[overcount(2,0)]. 


3. Count 


If PCTRO and PCTR1 are enabled, they increment according to modes selected by 
SLO & SLI. 


4. End window 


The last cycle of the window depends on whether the instruction traps, retires, 
aborts, and/or is squashed by the fetcher. 


For instructions that cause a trap, the last cycle in the window is the 2nd cycle after 
the trap. Mispredicted branches are included in this category. 


For nontrapping instructions that retire, the last cycle in the window is the 2nd 
cycle after the instruction retires. 


For instructions that abort, the last cycle in the window is the 2nd cycle after the 
trap that caused the abort. 


For instructions that are squashed (such as TRAPB), the last cycle in the window is 
approximately the 2nd cycle after the squashed instruction would have aborted or 
retired. 


Every non-squashed valid fetched instruction either aborts or retires, but not both. 
In either case, the instruction may also trap. 
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PCTRO is disabled from counting until PCTR_CTL is next written. 


Interrupt PALcode 


When ISUM field PC[1:0] is set, execution of PCTRO’s or PCTR1’s interrupt PAL- 


code is performed. 


6. Operating system interrupt handler 
The handler should first read the IPRs in Table 6-13 and then write PCTR_CTL to set 


up the next interrupt. 


Table 6-13 ProfileMe Mode Returned IPR Contents 








IPR Name Relevant Fields Meaning 
PMPC[63:0] All Profiled PC. 
L_STAT ICM Instruction was in a new Icache fill stream. 
TRP Instruction caused a trap and was not in the shadow of 
a younger trapping instruction. 
MIS Conditional branch mispredict. 
TRAP TYPE Exception type code. 
LSO Load-store order replay trap. 
OVR Counter 0 overcount. 
PCTR_CTL VAL Instruction retired valid. 
TAK Branch direction if instruction is a conditional branch. 
PM_STALLED Instruction stalled for at least one cycle between fetch 


PM_KILLED_BM 


PCTRO[19:0] 
PCTR1[19:0] 


6.10.3.3 ProfileMe Counting Mode Description 
6.10.3.3.1 Cycle counting 


and map stages of pipeline. 


Instruction killed during or before cycle in which it 
was mapped. 


Counter O value. 


Counter 1 value. 


In ProfileMe mode, either counter counts cycles during the window of the profiled 


instruction. 


6.10.3.3.2 Inum retire delay cycles 


This input is used to measure a lower bound on the inum retire delay of the profiled 
instruction. The maximum final value of PCTR1 is the length of the ProfileMe window 


minus 2. 


Counts cycles that a profiled instruction delayed the retire pointer advance during the 
ProfileMe window. The 21264A tracks instructions in the pipeline by allocating them 
"inums" near the front of the pipeline. All inums are retired in the order in which they 
were allocated at the end of the pipeline. 
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Inums are allocated in batches of four, so there may be more inums allocated than there 
are program instructions in flight. Every inum is retired in order, including those for 
aborted instructions. 


The "retire pointer" points to the next inum to be retired. An inum retires in the cycle 
that the retire pointer advances past the inum. 


Let X and Y be consecutive inums in the allocation order. The "inum retire delay" of Y 
is [(cycle in which Y retired) — (cycle in which X retired)]. A large inum retire delay 
indicates a possible performance bottleneck (for example, an instruction stalled on a 
data cache miss). 


6.10.3.3.3 Retired instructions cycles 


When counting retired instructions in ProfileMe mode, the final count in PCTRO may 
include instructions that retired before the ProfileMe window and may exclude instruc- 
tions that retired near the end of the ProfileMe window. These discrepancies are caused 
by a variable delay between the time that an instruction retires and the time that PCTRO 
is incremented for that retire. This discrepancy is in the range of plus or minus 4 retired 
instructions. 


6.10.3.3.4 Bcache miss or long latency probes cycles 
This input counts the number of times the Bcache result was a miss. 


Essentially, a long latency probe is a data request from other processes that cause 
Bcache misses in a system. 


This count is phase shifted three cycles early and thus includes events that occurred 
three cycles before the start and before the end of the ProfileMe window. 


6.10.3.3.5 Mbox replay traps cycles 
This input counts Mbox replay traps. 


PCTR1 is enabled to count Mbox replay traps that occur during a window that is the 
ProfileMe window phase-shifted one cycle later. The first replay trap counted would be 
the 7th cycle after the instruction is fetched. 


6.10.3.4 Counter Modes for ProfileMe Mode 


Table 6—14 shows the counter modes that are used with ProfileMe mode. 


Table 6—14 ProfileMe Mode PCTR_CTL Input Select Fields 





SLO[4] SL1[3:2] PCTRO PCTR1 

1 00 Retired instructions Cycle counting 

1 01 Cycle counting Inum retire delay 

] 10 Retired instructions Bcache miss or long latency probes 
1 11 Cycle counting Mbox replay traps 
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Initialization and Configuration 


This chapter provides information on 21264A-specific microprocessor system initial- 
ization and configuration. It is organized as follows: 


© Power-up reset flow 

© Fault reset flow 

e Energy star certification and sleep mode flow 
e Warm reset flow 

e = Array initialization 

e Initialization mode processing 

e External interface initialization 

e Internal processor register (IPR) reset state 

e JEEE 1149.1 test port reset 

e Reset state machine state transitions 

e = Phase-locked loop (PLL) functional description 


Initialization is controlled by the reset state machine, which is responsible for four 
major operations. Table 7—1 describes the four major operations. 


Table 7-1 21264A Reset State Machine Major Operations 


Operation 


Ramp up 
BiST/SROM 
Clock forward 


interface 


Ramp down 


Function 


Sequence the PLL input and output dividers (X4;, and Zg;,) to gradually raise the internal 
GCLK frequency and generate time intervals for the PLL to re-establish lock. 


Receive a synchronous transfer on the ClkF wdRst_H pin in order to start built-in self-test and 
SROM load at a predictable GCLK cycle. 


Receive a synchronous transfer on the CIkFwdRst_H pin in order to initialize the clock for- 
warding interface. 


Sequence the PLL input and output dividers (Xq;, and Zgj,) to gradually lower the internal 
GCLK frequency during sleep mode. 


7.1 Power-Up Reset Flow and the RESET_L and DCOK_H Pins 


The 21264A reset sequence is triggered using the two input signals Reset_L and 
DCOK_H in a sequence that is described in Section 7.1.1. After Reset_L is deasserted, 
the following sequence of operations takes place: 
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1. The clock forwarding and system clock ratio configuration information is loaded 
onto the 21264A. See Section 7.1.2. 


2. The internal PLL is ramped up to operating frequency. 


The internal arrays built-in self-test (BiST) is run, followed by Icache initialization 
using an external serial ROM (SROM) interface. 


The 21264A systems, unlike the Alpha 21064 and 21164 microprocessor systems, 
are required to have an SROM. The SROM provides the only means to configure 
the system port, and the SROM pins can be used as a software-controlled UART. 


The Icache must contain PALcode that starts at location 0x780. This code is used to 
configure the 21264A IPRs as necessary before causing any offchip read or write 
commands. This allows the 21264A to be configured to match the external system 
implementation. 


4. After configuring the 21264A, control can be transferred to code anywhere in 
memory, including the noncacheable regions. The Icache can be flushed by a write 
operation to the ITB invalidate-all register after control is transferred. This transfer 
of control should be to addresses not loaded in the Icache by the SROM interface or 
the Icache may provide unexpected instructions. 


5. Typically, any state required by the PALcode is initialized and then the console is 
started (switching out of PALmode and into native mode). The console code initial- 
izes and configures the system and boots an operating system from an I/O device 
such as a disk or the network. 


Figure 7-1 shows the sequence of events at power-up, or cold reset. In Figure 7—1, note 
the following symbols for constraints and information: 


Constraints: 


A __ Setup (AO) and hold (A1) for IRQ’s to be latched by DCOK (2 ns for each). 


B_ Enough time for Reset_L to propogate through 5 stages of RESET synchronizer (clocked by the inter- 
nal framing clock, which is driven by EV6Clk_x). Worst case through Pass 3 of the 21264A would be 
5x8x8 = 320 GCLK cycles, because Y4g;, values above 8 are out of range. 


C Min=1 FrameClk cycle. 


Information: 


8 GCLK cycles from DCOK assertion to first “real” EV6CIk_x cycle. 

Approximately 525 GCLK cycles for external framing clock to be sampled and captured. 
1 FrameClk_x cycle. 

3 FrameClk_x cycles. 

Approximately 264 GCLK cycles to prevent first command from appearing too early. 


Approximately 700,000 GCLK cycles for BiST + approximately 100,000 GCLK cycles fixed time + 
approximately 50,000 GCLK cycles per line of Icache for SROM load. 


g 16GCLK cycles. 


m™> oOaAnn @ & 
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Figure 7-1 Power-Up Timing Sequence 
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7.1.1 Power Sequencing and Reset State for Signal Pins 


Power sequencing and avoiding potential failure mechanisms is described in Section 
9.3. 


The reset state for the signal pins is listed in Table 7-2. 


Table 7-2 Signal Pin Reset State 





Signal Reset State Signal Reset State 

Bcache 

BcAdd_H[23:4] Tristated 

BcCheck_H[15:0] Tristated BcTagInClk_H NA (input) 

BcData_H[127:0] Tristated BcTagOE_L Tristated 

BcDataInClk_H[7:0] NA (input) BcTagOutClk_x Tristated 

BcDataOE_L Tristated BcTagParity_H Tristated 

BcDataOutClk_x[3:0] Tristated BcTagShared_H Tristated 

BcDataWr_L Tristated BcTagValid_H Tristated 

BcLoad_L Tristated BcTagWr_L Tristated 

BcTag_H[42:20] Tristated BcVref NA 

(_DC_REF) 

BcTagDirty_H Tristated 

System Interface 

IRQ_H[5:0] NA (input) SysDataInClk_H[7:0] NA (input) 

SysAddIn_L[14:0] NA (input) SysDataInValid_L NA (input) 

SysAddInClk_L NA (input) SysDataOutCik_L[7:0] Tristated 
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Table 7-2 Signal Pin Reset State (Continued) 








Signal Reset State Signal Reset State 
SysAddOut_L[14:0] Initially, during power-up reset, state SysDataOutValid_L NA (input) 

is not defined. If not during power- 

up, preserves previous state. Then, 

after the clock forward reset period 

(as the external clocks start), signal 

driven to NZNOP until the reset 

state machine enters RUN, when it 

is driven to NOP. 
SysAddOutClk_L Tristated SysFill Valid_L NA (input) 
SysCheck_L[7:0] Tristated SysVref NA 

: (I_DC_REF) 
. SysData_L[63:0] Tristated 

Clocks 
ClkFwdRst_H NA (input) FrameClk_x NA (input) 
CikIn_H NA (input) PLL_VDD NA 
ClkIn_L (I_DC_REF) 
EV6CIk_H NA (input) 
EV6CIk_L 
Miscellaneous 
DCOK_H Must be deasserted until dc voltage Tck_H NA (input) 

reaches proper operating level. 
PllBypass_H NA (input) Tdi_H NA (input) 
Reset_L NA (input) Tdo_H Unspecified 
SromClk_H Tristated TestStat_H Tristated 
SromData_H NA (input) Tms_H NA (input) 
SromOE_L Tristated Trst_L NA (input) 


In addition, as power is being ramped, Reset_L must be asserted — this allows the 
21264A to reset internal state. Once the target voltage levels are attained, systems 
should assert DCOK_H. This indicates to the 21264A that internal logic functions can 
be evaluated correctly and that the power-up sequence should be continued. Prior to 
DCOK_H being asserted, the logic internal to the 21264A is being reset and the inter- 
nal clock network is running (either clocked by the PLL VCO, which is at a nominal 
speed, or by ClkIn_H, if the PLL is bypassed). 


The reset state machine is in state WAIT_SETTLE. 


7.1.2 Clock Forwarding and System Clock Ratio Configuration 


7-4 


Initialization and Configuration 


When DCOK_H is asserted, the 21264A samples several pins and latches in some ini- 
tialization state, including the value of the PLL Yq, divisor, which specifies the ratio 

of the system clock to the internal clock (see Section 7.11.2.3), and enables the charge 
pump on the phase-locked loop. 
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Table 7—3 summarizes the pins and the suggested/required initialization state. Most of 
this information is supplied by placing (switch-selectable or hardwired) weak pull-ups 
or pull-downs on the IRQ_H pins. The IRQ_H pins are sampled on the rising edge of 
DCOK_H, during which time the 21264A is in reset and is not generating any system 
activity. During normal operation, the IRQ_H pins supply interrupt requests to the 
21264A. 


It is possible to disable the 21264A PLL and source GCLK directly from ClkIn_x. 
This mode is selected via PIiBypass_H. The 21264A still produces a divided-down 
clock on EV6CIk_x; this output clock, which tracks GCLK, can be used in a feedback 
loop to generate a locked input clock via an external PLL. The input clock can be 
locked against a slower speed system reference clock. 


Table 7-3 Pin Signal Names and Initialization State 








SignalName Sample Time Function Value 
PUBypass_H Continuous input Select ClkIn_x onto GCLK instead of internal 0 Bypass! 
PLL. 1 Use PLL 

CikFwdRst_H Sampling method — = 

according to 

IRQ_H[4] 
Reset_L Continuous input — — 
IRQ_H[5] Rising edge of Select 1:1 FrameClk mode. 0 Sample with 

DCOK_H Internal FrameClk can be generated two ways: FrameClk_H 


1 Use acopy of 


1 By sampling FrameClk_H. Used if EV6Clk H 


FrameClk_H is slower than ClkIn_H. 
2 Asa direct copy of EV6CIk_H. Used if 


FrameClik_H is the same frequency as 
ClkIn_H or is DC. 


IRQ_H[4] Rising edge of Select method of sampling ClkFwdRst_H to 0 Sample with Exter- 
DCOK_H produce internal ClkFwdRst — either with nal FrameCik_x 
external or internal copy of FrameClk_x. 1 Sample with Inter- 
nal Frameclk 


IRQ_H{[3:0] Rising edge of Select Ygj, divisor value. This is the divide- IRQ_H[3:0] Divisor 


DCOK_H down factor between GCLK and EV6ClIk_x. 

0011 3 
When the PLL is in use and the 21264A is ne 2 
ramped-up to full speed, the VCO adjusts in 0110 6 
order to phase-align (and rate-match) EV6CIk_x 9}11 7 
to ClkIn_x. When the PLL is not in use, and 0000 8 
CikIn_x is bypassed onto GCLK, EV6CIk_xis 1000 9 
slower than CikIn_x by the divisor Y gjy. cic i 

1011 12 

1100 13 


1101 14 
1110 15 
1111 16 
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Table 7~3 Pin Signal Names and Initialization State (Continued) 





SignalName Sample Time Function Value 





DCOK_H Continuous input When deasserted, initializes the internal 21264A — 
reset state machine and keeps the PLL internal 
oscillator running at a nominal speed. Assertion, 
which implies power to the 21264A is good, 
causes configuration information to be sampled. 


(The maximum permissible instantaneous change in ClkIn_x frequency is 333 MHz (to prevent cur- 
rent spikes). 


7.1.3 PLL Ramp Up 


After the configuration is loaded through the IRQ_H pins, the next phase in the power 
up flow is the internal PLL ramp up sequence. Ramping up of the PLL is required to 
guarantee that the dynamic change in frequency will not cause the supply on the 
21264A to fall due to the supply loop inductance. Clock control circuitry steps GCLK 
from power-up/reset clocking to 1/ 16% operating frequency, to % operating frequency, 
and finally normal operating frequency. 


After the assertion of DCOK_H, the 21264A waits for the deassertion of Reset_L from 
the system while the PLL attempts to achieve a lock. The PLL internal ramp dividers 
are set to divide down the input clock by 16 and the PLL attempts to achieve lock 
against an effective input frequency of ClkIn_x/16. Once lock is achieved, the actual 
internal frequency (GCLK) is ClkIn_x*(Yq;, divisor value)/16. There should be a 
minimum delay of 100 ms between the assertion of DCOK_H and the deassertion of 
Reset_L to allow for this locking The reset state machine is in the WAIT_.NOMINAL 
state. 


After the deassertion of Reset_L, the reset state machine goes into the RAMP state. 
The 21264A ramps the internal frequency, by changing the effective input frequency of 
the PLL to ClkIn_x/2 for a sufficient lock interval (about 20 tts). The state machine 
then goes into the RAMP? state, changing the effective input frequency to ClkIn/1 for 
an additional lock interval (about 20 ts). The lock periods are generated by the internal 
duration counter, which is driven by GCLK. The counter counts 4108 GCLK cycles 
during the ClkIn_x/2 lock interval. Note that GCLK is produced by the output of the 
PLL, which is locking to an input clock which is 1/2 of the operating frequency — 
therefore, the 4108 cycle interval constitutes a 12-20 js interval when the operating 
frequency is 400-666 MHz. Then, the counter counts 8205 GCLK cycles during the 
ClkIn_x/1 lock interval. 


7.1.4 BiST and SROM Load and the TestStat_H Pin 


The 21264A uses the deassertion of CIkFwdRst_H (which must be deasserted for a 
minimum of one FrameClk_H cycle and then reasserted) to begin built-in self-test 
(BiST). The reset state machine goes into the WAIT_BiST state. Details on BiST are 
given in Chapter 11. The power-up BiST lasts approximately 700,000 cycles. The result 
of the self-test is made available on the TestStat_H pin. The pin is forced low by the 
system reset. It is then forced high during BiST. 
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As BiST completes, the TestStat_H pin is held low for 16 GCLK cycles. Then, if BiST 
succeeds, the pin remains low. Otherwise, it is asserted. After successfully completing 
BiST, the 21264A then performs the SROM load sequence (described in Chapter 11). 
After the SROM load sequence is finished, the 21264A deasserts SromOE_L. 


7.1.5 Clock Forward Reset and System Interface Initialization 


After the deassertion of SromOE_L, the reset state machine enters the 
WAIT_CikFwdRstl state, where the 21264A waits for the system to deassert 
CikFwdReset_H. The 21264A samples the deasserting edge of ClkFwdReset_H to 
take synchronous actions. It uses this synchronous event to reset the clock forwarding 
interface, start the outgoing clocks, and deassert internal reset. The chip then waits 264 
cycles before issuing commands. The reset state machine is then in RUN and the 
21264A begins fetching code at address 0x780. 


Table 7—4 lists signals relevant to the power-up flow, provides a short description of 
each, and any relevant constraints. 


Table 7-4 Power-Up Flow Signals and Their Constraints 


Signal Name 


CikIn_x 


PLL_VDD 
VDD 


DCOK_H 


Reset_L 


CikFwdRst_H 
Deassertion #1 


CikFwdRst_H 
Deassertion #2 


Description Constraint 

Differential clocks that are Clocks must be running before DCOK_H is 
inputs to PLL or are asserted. 

bypassed onto GCLK 

directly 

VDD supply to PLL PLL_VDD must lead VDD. 

VDD supply to the 21264A — 

chip logic (except PLL) 


Logic signal to the 21264A — 
that the VDD supply is 
good 


RESET pin asserted by Reset_L must be asserted prior to DCOK_H and 

SYSTEM tothe 21264A — must remain asserted for at least 100 ms after 
DCOK_H is asserted. This allows for PLL settling 
time. Deassertion of Reset_L causes the 21264A 
to ramp divisors to their final value and begin BiST. 


Signal asserted by SYS- ClkF wdRst_H must be deasserted after PLL has 


TEM to synchronously achieved its lock in its final divisor value (about 20 
commence built-in self-test 1s). The deassertion causes built-in self-test to 
and SROM load begin on an internal clock cycle that corresponds to 


one framing clock cycle after CIkFwdRst_H is 
deasserted. CilkFwdRst_H can be asserted after 
one frame clock cycle. See Figure 7-1. 


Signal asserted by SYS- CikFwdRst_H must be deasserted when the Cbox 
TEM to initialize and reset has loaded configuration information. This occurs 
clock forwarding interfaces as the first part of the serial ROM load, after BiST 
. isrun. Once ClkFwdRst_H is deasserted, the 
interface is initialized and can receive probe 
requests from the 21264A. 
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7.2 Fault Reset Flow 


The fault reset sequence of operation is triggered by the assertion of the ClkFwdRst_H 
signal line. Figure 7—2 shows the fault reset sequence of operation. The reset state 
machine is initially in RUN state. CIkF wdRst_H is asserted by the system, which 
causes the state machine to transition to the WAIT_FAULT_RESET state. 


The 21264A internally resets a minimum amount of internal state. Note the effects of 
that reset on the IPRs in Table 7-5. 


Table 7—5 Effect on IPRs After Fault Reset 





IPR 

PAL_BASE 
CTL 
PCTX[FPE] 
WRITE_MANY 
EXC_ADDR 


After Reset 

Maintained (not reset) 

Bit value = 3 (both Icaches are enabled) 

Set 

Cleared (That is, the WRITE_MANY chain is initialized and the Bcache is turned off.) 


Set to an address that is close to the PC 


The 21264A then waits for ClkFwdRst_H to deassert twice: 


¢ One deassert to transition directly to the WAIT_CikFwdRst] state without perform- 
ing any BiST 


© One deassert to initialize the clock forwarding interface 
The 21264A then begins fetching code at PAL_BASE + 0x780. 


Figure 7-2 shows the fault reset sequence of operation. In Figure 7—2, note the follow- 
ing symbols for constraints and information: 


Constraints: 


A Min=1 FrameClk_x cycle 


Information: 

a Approximately 264 GCLK cycles 

b Approximately 525 GCLK cycles for external framing clock to be sampled and captured 
c 1 ¥FrameClk_x cycle plus 2 GCLK cycles 

e Next FrameClk_x rising edge 

f 3 FrameClk_x cycles 

g Approximately 264 GCLK cycles to prevent first command from appearing too early 
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Figure 7-2 Fault Reset Sequence of Operation 
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7.3 Energy Star Certification and Sleep Mode Flow 


The 21264A is Energy Star compliant. Energy Star is a program administered by the 
Environmental Protection Agency to reduce energy consumption. For compliance, a 
computer must automatically enter a low power sleep mode using 30 watts or less after 
a specified period of inactivity. When the system is awakened, the user shall be 
returned automatically to the same situation that existed prior to entering sleep mode. 


During normal operation, the 21264A encounters inactive periods and enters a mode 
that saves the entire active processor state to memory. 


The PALcode is responsible for saving all necessary state to DRAM and flushing the 
caches. 


The sleep mode sequence of operations is triggered by the PALcode twice performing a 
HW_MTPR to the Ibox SLEEP IPR. The first write prevents the assertion of 
CikF wdRst_H from fault-resetting the chip. 


The PALcode then informs the system, in an implementation-dependent way, that it 
may assert ClkFwdRst_H. 


On the second HW_MTPR to the SLEEP IPR, the PLL begins to ramp down and the 
21264A can then respond to the ClkFwdRst_H that was asserted by the system, caus- 
ing the outgoing clocks from the 21264A to stop. 


The PLL ramp-down sequence takes exactly the same amount of time as the ramp up 
sequence described in Section 7.1.3. The same internal duration counter is used and the 
reset state machine transitions through the DOWN1, DOWN2, and DOWNS states 
which have similar PLL divisor ratios and clock speeds to the RAMP2, RAMP1, and 
WAIT_NOMINAL states. 
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After the PLL has finished ramping down, the reset state machine enters the 
WAIT_INTERRUPT state. Note the effects of the entry into that state on the IPRs 
listed in Table 7-6. 


Table 7-6 Effect on IPRs After Transition Through Sleep Mode 


IPR Effects After Transition Through Sleep Mode 
PAL_BASE Maintained (not reset) 
_CTL Bit value = 3 (both Icaches are enabled) 


PCTX[FPE] Set 


WRITE_MANY Cleared (That is, the WRITE_MANY chain is initialized and the Bcache is 
turned off.) 


Note that Interrupt enables are maintained during sleep mode, enabling the 21264A to 
wake up. The 21264A waits for either an unmasked clock interrupt or an unmasked 
device interrupt from the system. 


When an enabled interrupt occurs, the PLL ramps back to full frequency. Subsequent to 
that, the 21264A performs a built-in self-initialization (BiSI), a shortened built-in self- 
test, which initializes the internal arrayed structures. The SROM is not reloaded. 
Instead, the 21264A begins fetching code from the system at address PAL_BASE + 
0x780. 


Figure 7—3 shows the sleep mode sequence of operations. In Figure 7—3, note the fol- 
lowing constraint and informational symbols: 


Constraints: 
A Min=1 FrameClk_x cycle 
Informational symbols: 


Approximately 525 GCLK cycles for external framing clock to be sampled and captured 
Next FrameClk_x rising edge 

1 FrameClk_x cycle 

3 FrameClk_x cycles 

Approximately 264 GCLK cycles to prevent first command from appearing too early 
Approximately 8192 GCLK cycles for BiSI 

16 GCLK cycles 


t@ > o7aA0 & BB 
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Figure 7-3 Sleep Mode Sequence of Operation 
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Table 7—7 describes each signal and constraint for the sleep mode sequence. 


Table 7-7 Signals and Constraints for the Sleep Mode Sequence 


Signal Name 


Description Constraint 


ClkFwdRst_H Signal asserted by the system to ClikFwdRst_H must be asserted by the system 
initialize and reset clock forwarding when entering sleep mode. The system deasserts 
interfaces CikFwdRst_H no sooner than one FrameCik_H 

cycle after sourcing an interrupt to the 21264A. 

Forwarded clocks Bit clocks forwarded to/from the Clocks stop running under ClkFwdRst_H. 


System interrupt 


21264A 


Asynchronous interrupt which —_ 
causes the 21264A to exit sleep 
mode 


7.4 Warm Reset Flow 


The warm reset sequence of operation is triggered by the assertion of the Reset_L sig- 
nal line. The reset state machine is initially in RUN state. The 21264A then, by default, 
ramps down the PLL (similar to the sleep flow sequence) and the reset state machine 
ends up in the WAIT_RESET state. 


Note the effects of entry into that state on the IPRs listed in Table 7-8. 


Table 7-8 Effect on IPRs After Warm Reset 








IPR Effects After Warm Reset 
PAL_BASE Cleared 
_CTL Cleared 


PCTX[FPE] Set 


WRITE_MANY Cleared (That is, the WRITE_MANY chain Is initialized and the Bcache is 
turned off.) 
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The 21264A waits until Reset_L is deasserted before transitioning from the 
WAIT_RESET state. The 21264A ramps up the PLL until the state machine enters the 
WAIT_ClikFwdRst0 state. Note that the system must assert ClkFwdRst_H before the 
state machine enters the WAIT_ClIkFwdRst0 state. Then, similarly to the other flows, 
SromOE_L is asserted and the system waits for the deassertion of ClkFwdRst_H. 


On the deassertion of ClkF wdRst_H, the 21264A performs BiST and the SROM load- 
ing procedure. 


After BiST and SROM loading have completed, SromOE_L deasserts and the 21264A 
waits for CIkFwdRst_H to deassert before starting the external clocks and, like the 
other flows, waits for 264 cycles before starting instructions. 


7.9 Array Initialization 


The following arrays are initialized by BiST: 


e = Icache and Icache tag 
Dcache, Dcache tag, and Duplicate Dcache tag 
¢ Branch history table 


The external second-level cache (Bcache) is disabled by Reset_L. 


The Bcache must be initialized by PALcode before it is enabled. 


7.6 Initialization Mode Processing 


The initialization mode allows the 21264A to generate and manipulate cache blocks 
before the system interface has been initialized. Within the 21264A, the Cbox configu- 
ration registers are divided into the WRITE_ONCE and the WRITE_MANY shift reg- 
ister chains (see Sections 5.4.3 and 5.4.4). The WRITE_ONCE chain is loaded from the 
SROM during reset processing, and contains information such as the clock forwarding 
setup values. The WRITE_MANY chain can be written many times using MTPR 
instructions. 


The WRITE_MANY chain contains the following CSRs that are important to initializa- 
tion mode, which must be set to the values in Table 7—9 to initialize the Bcache. 

Table 7-9 WRITE_MANY Chain CSR Values for Bcache Initialization 
WRITE_MANY Chain CSRs Required Value at Initialization Mode 


BC_ENABLE 1 
The duplicate bits for BC_LENABLE in [14:12] must 
be 0 during initialization mode. 


BC_SIZE[3:0] The exact size or maximum size of the Bcache. 
INVAL_TO_DIRTY_ENABLE[1:0] 1 
SET_DIRTY_ENABLE/[2:0] 0 
INIT_MODE 1 
EVICT_ENABLE 0 
BC_WRT_STS[3:0] 0 
BC_BANK_ENABLE 0 
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Initialization Mode Processing 


Except for INIT_MODE, all the CSR registers have been described in earlier sections. 
When asserted, INIT_MODE has the following behavior: 


e — Cache block updates to the Dcache set the block to the Clean state. 
° Updates to the Bcache use the BC_WRT_STS[3:0] bits. 
e WrVictimBlk command generation to the system interface are squashed. 


Using the INVAL_TO_DIRTY_ENABLE and INIT_MODE registers, initialization 
code loaded from the SROM can generate and delete blocks inside the 21264A without 
system interaction. This behavior is very useful for initialization and startup processing, 
when the system interfaces are not fully functional. Figure 7-4 shows a code example 
for initializing Bcache. 


Figure 7—4 Example for Initializing Bcache 

Reset chip and load Icache with this code 

set init_mode ;now all WrVictims are ignored 
;bc_enable_a 1 
;zeroblk_enable_a 
;set_dirty_enable_a 
;init_mode_a 
;enable_evict_a 
;bc_wrt_sts_a 
;bc_bank_enable_a 
;be_size_a 15 


oo ororFr 


;now all writes to Bcache actually invalidate 
;the Beache. (if space was needed for scratch 
jpad, the status bits could just as 

;well be Valid) 


for 2 X bc_size ;This loop generates legal ECC data, and 
{ WH64 address } ;invalidate tags which are written to the 
7Bcache for all but the final 64KB of address. 


turn_off_bcache: ;be_enable_a 
jinit_mode_a 
;be_size_a 
;zeroblk_enable_a 
;enable_evict_a 
;set_dirty_enable_a 
;bc_bank_enable_a 
;be_wrt_sts_a 


COO ORO 0 6 


SweepMemory: ;Write good parity/ecc to memory by 
; writing a all memory locations. This is 
;Gone by WH64 of memory addresses 


turn_on_bcache: ;be_enable_a 0 
;bc_size_a Actual Bceache size 
;zeroblk_enable_a 3 
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External Interface Initialization 


;set_dirty_enable a 6 
;init_mode_a 0 
;enable_evict_a 0 
;bc_wrt_sts_a 0 
;bc_bank_enable_a 0 
for 2 X be_size ;This loop generates legal ECC data, and 
{ WH64 address } ;invalidate tags which are written to the 


;Bcache for all but the final 64KB of address. 
for 2 X dcache size 
{ ECB address } ;and cleans up the Deache also. 
(done) 


In addition to initialization, the dynamic programming ability of the WRITE_LMANY 
chain provides the basic tools to build various other software flows such as dynamically 
changing the Bcache enable/size parameters for performance testing. 


7.7 External Interface Initialization 


After reset, the system interface is in the default configuration dictated by the reset state 
of the IPR bits that select the configuration options. 


The response to system interface commands and internally generated memory accesses 
is determined by this default configuration. System environments that are not compati- 
ble with the default configuration must use the SROM Icache load feature to initially 
load and execute a PALcode program to configure the external system interface unit 
IPRs as needed. 


7.8 Internal Processor Register Power-Up Reset State 


Many IPR bits are not initialized by reset. They are located in error-reporting registers 
and other IPR states. They must be initialized by initialization PALcode. Tables 7—5, 
7-6, and 7-8, list the effects on IPRs by fault reset, transition through sleep mode, and 
warm reset, respectively. Table 7-10 lists the state of all internal processor registers 
(IPRs) immediately following power-up reset. The table also specifies which registers 
need to be initialized by power-up PALcode. 


Table 7-10 Internal Processor Registers at Power-Up Reset State 











Mnemonic Register Name Reset State Comments 

Ibox IPRs 

ITB_TAG ITB tag array write x — 

ITB_PTE ITB PTE array write Xx — 

ITB_IAP ITB invalidate-all (ASM=0) xX — 

ITB_IA ITB invalidate all x Must be written to in PALcode. 
ITB_IS ITB invalidate single Xx — 

PMPC ProfileMePC X — 

EXC_ADDR Exception address x — 

IVA_FORM Instruction VA format xX — 
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Internal Processor Register Power-Up Reset State 


Table 7-10 Internal Processor Registers at Power-Up Reset State (Continued) 








Mnemonic Register Name Reset State Comments 
TER_CM Interrupt enable current mode xX Must be written to in PALcode. 
SIRR Software interrupt request x — 
ISUM Interrupt summary x — 
HW_INT_CLR Hardware interrupt clear xX Must be cleared in PALcode. 
EXC_SUM Exception summary x —_ 
PAL_BASE PAL base address Cleared — 
I_CTL Tbox control IC_EN =3 All other bits are cleared on reset. 
I_STAT Tbox status xX Must be cleared in PALcode. 
IC_FLUSH Icache flush xX — 
CLR_MAP Clear virtual-to-physical map X — 
SLEEP Sleep mode x — 
PCTX Ibox process context PCTX[FPE] is set. All other bits are cleared. 
PCTR_CTL Performance counter contro} x Must be cleared in PALcode. 
Ebox IPRs 
CC Cycle counter x Must be cleared in PALcode. 
CC_CTL Cycle counter control x Must be cleared in PALcode. 
VA Virtual address xX — 
VA_FORM Virtual address format xX — 
VA_CTL Virtual address control x Must be cleared in PALcode. 
Mbox IPRs 
DTB_TAGO DTB tag array write 0 Cleared — 
DTB_TAG1 DTB tag array write 1 Cleared — 
DTB_PTEO DTB PTE array write 0 Cleared — 
DTB_PTE1 DTB PTE array write 1 Cleared — 
DTB_ALTMODE DTB alternate processor mode Xx PALcode must initialize. 
DTB_IAP DTB invalidate all process Xx — 

ASM =0 
DTB_IA DTB invalidate all process xX Must be written to in PALcode. 
DTB_ISO DTB invalidate single (array 0) x — 
DTB_IS1 DTB invalidate single (array 1) x — 
DTB_ASNO DTB address space number 0 Cleared — 
DTB_ASN1 DTB address space number 1 Cleared — 
MM_STAT Memory management status Xx — 
M_CTL Mbox control Cleared — 
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IEEE 1149.1 Test Port Reset 


Table 7-10 Internal Processor Registers at Power-Up Reset State (Continued) 














Mnemonic Register Name Reset State Comments 

DC_CTL Deache control DC_CTL[7:2] are cleared at reset. 
DC_CTL[1:0] are set at power up. 
DC_STAT Deache status X Must be cleared in PALcode. 
Cbox IPRs 

C_DATA Cbox data X Must be read in PALcode. 
C_SHFT Cbox shift control Xx — 


7.9 IEEE 1149.1 Test Port Reset 


Signal Trst_L must be asserted when powering up the 21264A. Trst_L must not be 
deasserted prior to assertion of DCOK_H. Trst_L can remain asserted during normal 
operation of the 21264A. 


7.10 Reset State Machine 


The state diagram in Figure 7-5 summarizes how the 21264A transitions into running 
code. Each state is described in Table 7—11. Table 7-11 describes outputs and approxi- 
mate state transition equations. Note that there are implicit transitions from each state 
to an appropriate down-ramp state when Reset_L is asserted. 
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Reset State Machine 


Figure 7-5 21264A Reset State Machine State Diagram 
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Table 7-11 21264A Reset State Machine State Descriptions 








State Name Description 

COLD Chip cold. Transitioned to WAIT_SETTLE with assertion of Reset_L, PLL_VDD, and 
VDD. 

WAIT_SETTLE PLL_VDD asserted; PLL at minimum frequency. 


WAIT_NOMINAL 


RAMP!1 


21264A Revision 1.1 — Subject To Change 


Triggered by assertion of DCOK_H. PLL achieves a lock at Xq;, and Zg;, divisors equal 
16 and 32, respectively. 


Triggered by Reset_L deassertion; Xg;, and Z4;, divisors are changed to 2 and 4, respec- 
tively, increasing the internal GCLK frequency. An internal duration counter is initial- 
ized to count 4108 GCLK cycles. 
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Reset State Machine 


Table 7-11 21264A Reset State Machine State Descriptions (Continued) 





State Name 


Description 





RAMP2 


WAIT_ClkFwdRst0 


WAIT_BiST 


WAIT_BiSI 


WAIT_CikFwdRst1 


WAIT_RESET 


FAULT_RESET 


DOWNI1 


Triggered by the duration counter reaching 4108 cycles, the Xg:, and Z4;, divisors are 
changed to I and 2, respectively, and the frequency is increased. The duration counter is 
reloaded to count 8205-cycles. 


Triggered by the duration counter reaching 8205 cycles (or by the deassertion of 
Reset_L while in the WAIT_RESET state). 21264A asserts SromOE_L and waits for 
SYSTEM to deassert CIkFwdReset_H. The deassertion must be synchronous to a fall- 
ing edge of FrameClk_H. 21264A uses this deassertion to begin BiST and SROM load 
at a predictable time. 21264A samples and generates an internal, aligned copy of 
FrameClk_H, and, in turn, uses this clock to sample ClkFwdReset_H. 


BiST and SROM load is started. The SROM first loads the Write-once chain and then 
reads the number of bits of Icache data to load. 


This state is entered when ‘waking up’ from sleep mode. 21264A receives an external 
interrupt, ramps the PLL, synchronously samples a transition on ClkFwdReset_H, and 
runs built-in self-initialization to clear the internal caches. Built-in self-test is not per- 
formed and the SROM is not loaded. 


Entered when the appropriate amount of BiST and SROM loading has been completed. 
21264A deasserts SromOE_L and waits for SYSTEM to deassert ClkF wdReset_H. 
The deassertion must be synchronous to a rising edge of FrameClk_H. 21264A uses 
this synchronous event to reset the clock forwarding interface and deassert internal reset. 
21264A subsequently begins running code (either preloaded in the SROM or located in 
memory) and begins system transactions. 


Chip is running software, interface is reset, and system transactions can be processed. 
From power-up, the Icache sets are enabled and contain bootstrap code loaded from the 
SROM;; 21264A executes code from Icache. From wake-up, the Icache sets are disabled 
and 21264A fetches and executes code from DRAM. 


Triggered by duration counter reaching 264 cycles, or when Reset_L is asserted when in 
WAIT_INTERRUPT state. 21264A waits in this state until Reset_L is deasserted, at 
which point, the PLL starts to ramp up again. 


ClkFwdReset is asserted while the 21264A is running. The 21264A internally resets a 
minimum amount of internal state, waits for clock forward reset deassertion, and begins 
fetching code at PAL_BASE + 0x780. 


21264A was in a state in which GCLK was at its highest speed and Reset_L was 
asserted. Internal chip functions are reset and the internal duration counter is set to 8205 
cycles. The purpose of this sequence is to down-ramp the clocks in anticipation of power 
being removed. If power is not removed (that is, reset is being toggled), 21264A ramps 
the clocks back to the original speed. 

This state is also entered when software writes the I_CTL internal processor register to 
sleep mode. 
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Phased-Lock Loop (PLL) Functional Description 


Table 7-11 21264A Reset State Machine State Descriptions (Continued) 


State Name Description 





DOWN2 Triggered by duration counter reaching 8205 cycles, the PLL ramps GCLK frequency 
down by the first divider ratio (Xg;, and Z4;,, equal 2 and 4, respectively). This has the 
effect of halving the GCLK frequency. The duration counter is set to 4108 cycles. 


DOWN3 Triggered by duration counter reaching 4108 cycles, the PLL ramps frequency down by 
the second divider ratio (Xgj, and Zg;, equal 16 and 32, respectively). This has the 
effect of reducing the frequency by a factor of 16 (of the original frequency). The inter- 
nal counter is set to 264 cycles. 


WAIT_INTERRUPT _ Triggered by duration counter reaching 264 cycles, the 21264A waits for either an 
unmasked clock interrupt or unmasked device interrupt from system. The interrupts are 
wired to the interrupt request and enable internal registers. When an enabled interrupt 
occurs, the PLL ramps back to full frequency. Subsequent to that, the built-in self-init 
(BiSI) initializes arrayed structures. The SROM is not reloaded; instead, the 21264A 
begins fetching code from the SYSTEM. 


7.11 Phased-Lock Loop (PLL) Functional Description 


The PLL multiplies the clock frequency of a differential input reference clock and 
aligns the phase of its output to that differential input clock. Thus, the 21264A can com- 
municate synchronously on clock boundaries with clock periods that are defined by the 
system. 


Note: For a complete explanation of the PLL, consult the 2/264A PLL Specifica- 
tion, located in the same documentation repository as the 21264A Specifi- 
cations (this document). 


7.11.1 Differential Reference Clocks 


A skew-controlled, ac-coupled differential clock is provided to the PLL by way of 
CikIn_x . ClkIn_x are input signals to a differential amplifier. The frequency of 
CikIn_x can range from 80 MHz to 200 MHz. ClkIn_x can be sourced by a variety of 
components that include ECLps fanout parts or system PLLs. ClkIn_x are also the pri- 
mary clock source for the 21264A when in PLL bypass mode. 


7.11.2 PLL Output Clocks 
The following sections summarize the PLL output clocks. 


7.11.2.1 GCLK 


The PLL provides an output clock, GCLK, with a frequency that can range from 400 
MHz to 666.7 MHz under full-speed conditions. GCLK is the nominal onchip clock 
that is distributed to the entire 21264A chip. 


7.11.2.2 Differential 21264A Clocks 


The EV6Clk_x output pads provide an external test point to measure the PLL phase 
alignment. They do not provide a clock source. EV6CIk_x are square-wave signals 
that drive rail-to-rail continually from 0 to 2.3 volts. 
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Phased-Lock Loop (PLL) Functional Description 


7.11.2.3. Nominal Operating Frequency 


Under normal operating conditions, the frequency of the PLL output clock, GCLK, is a 
simple function of the Yq; divider value. 


Table 7-12 shows the allowable ClkIn_x frequencies for a given operating frequency 
of the 21264A and the Y,;, divider. For example, to set the 21264A GCLK frequency to 
500 MHz with a CikIn_x frequency of 166.7 MHz, the system must select a Yi, 
divider of 3 by placing the value 0011, on pins IRQ_H[3:0]. 


Table 7-12 Differential Reference Clock Frequencies in Full-Speed Lock 





GCLK 


Period (ns) Frequency (MHz) 


pee) 
2.4 
2.3 
2.2 
2. 
2.0 
1.9 
1.8 
1.7 
1.6 
1.5 
1.4 
1.3 
1.2 


400 
416.7 
434.8 
454.5 
476.2 
500 
526.3 
555.6 
588.2 
625 
666.7 
714.3 
769.2 
833.3 
1 


32 

133.3 
138.9 
144.9 
151.2 
158.7 
166.7 
175.4 
185.2 
196.1 


4 
100 

104.2 
108.7 
113.6 
119.0 
125.0 
131.6 
138.9 
147.1 
156.3 
166.7 
178.6 
192.3 


5 

80 
83.3 
87.0 
90.9 
95.2 
100 
105.3 
111.1 
117.6 
125.0 
133.3 
142.9 
153.8 
166.7 


6 


83.3 
87.7 
92.6 
938.0 
104.2 
111.1 
119.1 
128.2 
138.9 


7 


— 


84.0 
89.3 
95.2 
102.0 
109.9 
119.0 


8 


_ 


83.3 
89.3 
96.2 
104.2 


Dividers 11 through 16 are out of range for the 21264A and reserved for future use. 


Reference Clock Frequency (MHz) for Y4;,, Dividers' 


9 10 I 
85.5 — — 
92.6 833 — 


Valid reference 


clock (ClkIn_x) frequencies for the 21264A are specified in the range from 80 to 200. Divider values 
that are out of that range are displayed as a dash “—”. 


2 Dividers of 1 and 2 are to be used only in a PLL test mode. 


7.11.2.4 Power-Up/Reset Clocking 


During the power-up/reset sequence, when not in PLL bypass mode, there may be a 
period of time when ClkIn_x is not yet running, but there is a voltage on PLL_VDD. 
The signal DCOK_H is deasserted until power is good throughout the system. The 
10% to 90% rise time of DCOK_H should be less than 2 ns. The deasserted state of 
DCOK_H and the presence of PLL_VDD causes the PLL to generate a global clock 
that is distributed throughout the 21264A with a frequency range of 1 MHz to 500 
MHz. The presence of the global clock during this period avoids permanent damage to 
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the 21264A. 
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Error Detection and Error Handling 





This chapter gives an overview of the 21264A error detection and error handling mech- 
anisms, and is organized as follows: 


e Data error correction code 


Icache data or tag parity error 


Deache tag parity error 


Dcache data correctable ECC error 


Deache store second error 


Dcache duplicate tag parity error 


Bcache tag parity error 


Bcache data correctable ECC error 


Memory/system port data correctable ECC error 


Bceache data correctable ECC error on a probe 


Double-bit fill errors 


Error case summary 


Table 8—1 summarizes the 21264A error detection. 


Table 8—1 21264A Error Detection Mechanisms 








Component Error Detection Mechanism 
Bceache tag Parity protected. 

Bcache data array Quadword-ECC protected. 
Deache tag array Parity protected. 


Dcache duplicate tag array _— Parity protected. 


Dcache data array Quadword-ECC protected, however this mode of operation is 
only supported in systems that have ECC enabled on both the 
system and Bcache ports. 


Icache tag array Parity protected. 

Icache data array Parity protected. 

System port data bus Quadword-ECC protected. 
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Data Error Correction Code 


8.1 Data Error Correction Code 


The 21264A supports a quadword error correction code (ECC) for the system data bus. 
ECC is generated by the 21264A for all memory write transactions (WrVictimBIk) 
emitted from the 21264A and for all probe data. ECC is also checked on every memory 
read transaction for single-bit correction and double-bit error detection. Bcache data is 
checked for fills to the Dcache and Icache, and for Bcache-to-system transfers that are 
initiated by a probe (if enabled by the CSR ENABLE_PROBE_CHECK). 


The 21264A ECC implementation corrects single-bit errors in hardware. 


I/O write transaction data will not have a valid ECC (the ECC bits must be ignored by 
the system). Also, ECC checking is not performed on I/O read data. 


Error detection and correction can be enabled/disabled by way of Mbox IPR 
DC_CTL[DCDAT_ERR_EN]. 


Table 8—2 shows the ECC code. 


Table 8-2 64-Bit Data and Check Bit ECC Code 





CB2 
CB3 
CB4 
CB5 
CB6 


CB7 


8.2 


8.3 


8-2 


0123 
0111 
1110 
1001 
1100 
0011 
0000 
1111 


1111 


4567 
0100 
1010 
1001 
0111 
1111 
0000 
1111 


1111 


11 1111 1111 2222 2222 2233 3333 3333 4444 4444 4455 5555 5555 6666 ccCCc cccc 
8901 2345 6789 0123 4567 8901 2345 6789 0123 4567 8901 2345 6789 0123 0123 4567 


1101 0010 0111 0100 1101 0010 1000 1011 0010 1101 1000 1011 0010 1101 1000 0000 
1010 1000 1110 1010 1010 1000 1110 1010 1010 1000 1110 1010 1010 1000 0100 0006 
0110 0101 1001 1001 0110 0101 1001 1001 0110 0101 1001 1001 0110 0101 0010 0000 
00021 1100 1100 0111 0001 1100 1100 0111 0001 1100 1100 0111 0001 1100 0001 0000 
0000 0011 0011 1111 0000 0012 0011 1111 0000 0011 0011 1111 0000 0011 0000 i000 
1111 1111 0000 0000 1111 1111 0000 0000 1111 1111 0000 0000 1111 1111 0000 0100 
0000 0000 0000 0000 1111 1111 1111 1111 0000 0000 0000 0000 1111 1111 0000 0010 


0000 0000 0000 0000 1111 1111 0000 0000 1111 1111 1111 1111 0000 0000 0000 0002 


icache Data or Tag Parity Error 


The following actions are performed when an Icache data or tag parity error occurs. 


1. When the hardware detects an error during an Icache read transaction, it traps and 
replays the instructions that were fetched during the error, then flushes the entire 
Icache so the re-fetched instructions do not come directly from the Icache. 


2. I_STAT[PAR] is set. 
A corrected read data (CRD) interrupt is posted, when enabled. (Pass 3 only) 


Dcache Tag Parity Error 


Error Detection and Error Handling 


The primary copies of the Dcache tags are used only when servicing 21264A-generated 
load and store instructions.There are correctable and uncorrectable forms of this error. If 
an issued load or store instruction detects a Dcache tag parity error, the following actions 
are performed: 


1. MM_STAT[DC_TAG_PERR] is set. 
2. A Dstream fault (DFAULT) is taken. 


3. The virtual address associated with the error is available in the VA register. 
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Dcache Data Single-Bit Correctable ECC Error 


4. The PALcode flushes the error block by temporarily disabling 
DC_CTL[DCTAG_PAR_EN] and evicting the block using two HW_LD instruc- 
tions. The onchip duplicate tag provides the correct victim address and cache 
coherence state. 


If a retried load instruction detects the Dcache tag parity error, the memory reference 
may have already been retired, so the EXC_ADDR is not available. In this case, the 
error is uncorrectable and the Mbox performs the following actions: 


e Either DC_STAT[TPERR_PO] or DC_STAT[TPERR_P1] is set, indicating the 
source of the error. 


e When enabled, a machine check (MCHK) is posted. The MCHK is taken when not 
in PALmode. 


8.4 Dcache Data Single-Bit Correctable ECC Error 


The following operations may cause Dcache data ECC errors: 
e Load instructions 

e =©Stores of less than quadword length 

e Dcache victim read transactions 


The hardware flow used for Dcache data ECC errors depends on the event that 
caused the error. 


8.4.1 Load Instruction 


Loads that read data from the Dcache may do so either in the same cycle as the Dcache 
tag probe (typical case) or in some subsequent cycle (load-queue retry). The hardware 
functional flows for these two error cases differ slightly. 


When a load instruction reads the Dcache data array in the same cycle as the tag array, 
if an ECC error occurs on the LSD ECC error detectors, then the Ibox stops retiring 
instructions and does not resume retiring until after hardware recovers from the error. 


If an ECC error occurs on the LSD ECC error detectors, when a load instruction reads 
the Dcache tag array before it reads the Dcache data array, then the load instruction may 
have already been retired. In either case: 


e The incorrect data is written into the load instruction’s destination register; 
however, the load queue retains the state associated with the load instruction. 


e Aconsumer of the load instruction’s data may be issued before the error is 
recognized; however, the Ibox will invoke a replay trap at an instruction that is 
older than (or equal to) any instruction that consumes the load instruction’s data, 
and then stalls the replayed Istream in the map stage of the pipeline until the error is 
corrected. 


e Given a READ_ERR read-type from the Mbox for the error load instruction, the 
Cbox scrubs the block in the Dcache by evicting the block into the victim buffer 
(thereby scrubbing it) and writing it back into the Dcache as follows: 


- C_STAT[DSTREAM_DC_ERR] is set. 
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Dcache Store Second Error 


~— C_ADDR contains bits [19:6] of the Dcache address of the block that contains 
the error (bits [42:20] of the physical address are not updated). 


—- DC_STAT[ECC_ERR_LD] is set. 
— The load queue retries the load and rewrites the register. 


— Acorrected read data (CRD) error interrupt is posted, when enabled. 


Note: Errors in speculative load instructions cause a CRD error interrupt 
to be posted but the data is not scrubbed by hardware. The PALcode 
cannot perform a scrub because C_STAT is zero and C_ADDR does not 
contain the address of the error. 


8.4.2 Store Instruction (Quadword or Smaller) 


A store instruction that is a quadword or smaller could invoke a Dcache ECC error, 
since the original quadword must be read to calculate the new check bits. 


¢ The Mbox scrubs the original quadword and replays the write transaction. 
¢ DC_STAT[ECC_ERR_ST] is set. 


e Acorrected read data (CRD) error interrupt is posted, when enabled. 
8.4.3 Deache Victim Extracts 


¢ Dcache victims with an ECC error are scrubbed as they are written into the 
victim data buffer. 


¢ No status is logged. 


¢ No exception is posted. 


8.5 Dcache Store Second Error 


A second store instruction error is logged when it occurs close behind the first. 
Neither error is corrected. 


¢ DC_STAT[ECC_ERR_ST] is set. 
¢ DC_STAT[SEO] is set. 


¢ When enabled, a machine check (MCHK) is posted. The MCHK is taken when not 
in PALmode. 


8.6 Dcache Duplicate Tag Parity Error 


The Dcache duplicate tag has the correct version of the Dcache coherence state for the 
21264A, allowing it to be used for correct tag/status data when the Dcache tags gener- 
ate a parity error. These tags are parity protected also; however, the Dcache duplicate 
tag cell is designed to be much more tolerant of soft errors. The parity generators for the 
duplicate tags are enabled whenever the Cbox performs a physically-indexed read 
transaction of eight locations in the tag array. If an error is generated, the following 
actions are taken: 


¢ Dcache duplicate tag parity errors are not recoverable. 
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C_STAT[DC_PERR] is set. 


C_ADDR contains bits [42:6] of the Dcache duplicate tag address of the block that 
contains the error. 


When enabled, a machine check (MCHK) is posted. The MCHK is taken when not 
in PALmode. 


8.7 Bcache Tag Parity Error 


The Beache tag parity is checked on all Bcache tag references, including references 
invoked by system probes. If an error is detected, the following actions are taken: 


Bceache tag parity errors are not recoverable. 
C_STAT[BC_PERR] is set. 


C_ADDR contains bits [42:6] of the Bcache address of the block that contains the 
error. 


When enabled, a machine check (MCHK) is posted. The MCHK is taken when not 
in PALmode. 


8.8 Controlling Bcache Block Parity Calculation 


Parity is calculated for either valid Bcache blocks or all Bcache blocks. The calculation 
is controlled by the value in Cbox CSR BC_VALID_MODE in the WRITE_MANY 
chain, as follows: 


If the MSB of BcTag_H is less than the value of Maximum PA in Table 4—13, then 
BC_VALID_MODE=1 and parity is calculated for only valid Bcache blocks. 


If the MSB of BcTag_H is greater than or equal to the value of Maximum PA in 
Table 4-13, then BC_VALID_MODE=0 and parity is calculated for all Bcache 
blocks. 


For example, if BcTag_H[38:20] and Maximum PA is 36, then 38 is greater than or 
equal to 36 and BC_VALID_MODE=0 and parity is calculated for all Bcache blocks. 


8.9 Bcache Data Single-Bit Correctable ECC Error 


The following actions may trigger Bcache data ECC errors: 


Icache fill, data possibly used by Icache 
Deache fill, data possibly used by load instruction 


Bcache victim during an ECB instruction or during a Dcache/Bcache miss 


The recovery mechanism depends on the action that triggered the error. 


8.9.1 Icache Fill from Bcache 


For an Icache fill, the LSD ECC checkers detect the error, and bad Icache data parity is 
generated for the octaword that contains the quadword in error. If an error is detected, 
the following actions are taken: 


The hardware flushes the Icache. 
C_STAT[ISTREAM_BC_ERR] is set. 
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Bcache Data Single-Bit Correctable ECC Error 


C_ADDR contains bits [42:6] of the Bcache fill address of the block that contains 
the error. 

C_SYNDROME_0{[7:0] and CLSYNDROME_1[7:0] contain the syndrome of 
quadword 0 and 1, respectively, of the octaword subblock that contains the error. 
A machine check (MCHK) is posted and taken immediately. The PALcode machine 


check handler performs a scrubbing operation as described in Section D.36 to 
ensure that the origination point of the error is corrected. 


Note: A corrected read data (CRD) error interrupt is also posted in case this error 


is in a speculative path and the MCHK is removed. The CRD PALcode 
reads the status, to detect this condition, and scrubs the block. In the normal 
MCHK flow, the PALcode clears the pending CRD error. 


8.9.2 Dcache Fill from Bcache 


If the quadword in error is not used to satisfy a load instruction, a hardware recovery 
flow is not invoked. The quadword in error, and its associated check bits, are written 
into the Dcache. However, status is logged as shown in the bulleted list below, and a 
corrected read data (CRD) error interrupt is posted, when enabled. PALcode may elect 
to correct the error by scrubbing the block. If the error is not corrected by PALcode 
when it occurs, the error will be detected and corrected by a later load/victim operation. 


If the quadword in error is used to satisfy a load instruction, then the flow is very simi- 
lar to that used for a Dcache ECC error. The LSD ECC checker detects the error and the 
21264A performs the following actions: 


The load instruction’s destination register is written with incorrect data; however, 
the load queue will retain the state associated with the load instruction. 


A consumer of the load instruction’s data may be issued before the error is 
recognized. The Ibox will invoke a replay trap at an instruction that is older than (or 
equal to) any instruction that consumes the load instruction’s data. The 21264A 
then stalls the replayed Istream in the map stage of the pipeline, until the error is 
corrected. 


With a READ_ERR read type from the Mbox for the load instruction in error, the 
Cbox scrubs the block in the Dcache by evicting the block into the victim buffer 
and writing it back into the Dcache. 


C_STAT[DSTREAM_BC_ERR] is set. 


C_ADDR contains bits [42:6] of the Bcache fill address of the block that contains 
the error. 


C_SYNDROME_0[7:0] and C_LSYNDROME_1[7:0] contain the syndrome of 
quadword 0 and 1, respectively, of the octaword subblock that contains the error. 


The load queue retries the load instruction and rewrites the register. 
DC_STAT[ECC_ERR_LD}] is set. 


A corrected read data (CRD) error interrupt is posted, when enabled. 
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Memory/System Port Single-Bit Data Correctable ECC Error 


Note: Errors in speculative load instructions cause a CRD error to be posted but 
the data is not scrubbed by hardware. The PALcode cannot perform a scrub 
operation because C_STAT is zero and C_ADDR does not contain the 
address of the block in error. 


8.9.3 Bcache Victim Read 


A victim from the Bcache is written directly to the system port, without correction. The 
ECC parity checker on the LSD detects the error and posts a corrected read data (CRD) 
error interrupt. The Cbox error register is not updated. 


8.9.3.1 Bcache Victim Read During a Deache/Bcache Miss 


While the Bcache is servicing a Dcache miss and that Bcache access is also a miss, and 
an error occurs during that Bcache data access, the Cbox does not latch the error infor- 
mation. However, the Mbox correction state machine is activated and it invokes a CRD 
error despite the fact that no correction is performed. 


The Bcache access error is written out to memory and is subsequently detected and cor- 
rected by the next consumer of the data. 


e Nocorrection is made. 

¢ No status is logged (C_STAT = 0). 

e ACRD error interrupt is posted, when enabled. 
8.9.3.2 Bcache Victim Read During an ECB Instruction 


A victim from the Bcache that occurs while an ECB instruction is being executed is 
written directly to the system port without correction. No Cbox registers are set and no 
exception is taken. 


8.10 Memory/System Port Single-Bit Data Correctable ECC Error 


The following actions may cause memory/system port data ECC errors: 
¢ = Icache fill—data possibly used by Icache 
e Dcache fill-data possibly used by a load instruction 


The recovery mechanism depends on the event that caused the error. 


8.10.1 Icache Fill from Memory 


For an Icache fill the LSD ECC generators detect the error, and bad Icache data 
parity is generated for the octaword that contains the quadword in error. 


e The hardware flushes the Icache. 
e C_STAT{ISTREAM_MEM_ERR] is set. 


¢ C_ADDR contains bits [42:6] of the system memory fill address of the block that 
contains the error. 


e C_SYNDROME_0[7:0] and C_LSYNDROME_1{[7:0] contain the syndrome of 
quadword 0 and 1, respectively, of the octaword subblock that contains the error. 
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A machine check (MCHK) is posted and taken immediately. The PALcode machine 
check handler performs a scrubbing operation as described in Section D.36 to 
ensure that the origination point of the error is corrected. 


Note: Also, a corrected read data (CRD) error is posted, when enabled, in case 


this error is in a speculative path and the MCHK is removed. The CRD 
error PALcode reads the status to detect this condition and scrubs the block. 
In the normal MCHK flow, the PALcode clears the pending CRD error. 


8.10.2 Dcache Fill from Memory 


If the quadword in error is not used to satisfy a load instruction, no hardware 

recovery flow is invoked. The quadword in error, and its associated check bits, are writ- 
ten into the Dcache. However, status is logged as shown in the bulleted list below and a 
corrected read data (CRD) error interrupt is posted, when enabled. PALcode may 
choose to correct the error by scrubbing the block. If the error is not corrected by PAL- 
code at the time, the error will be detected and corrected by a load/victim operation. 


If the quadword in error is used to satisfy a load instruction, then the flow is very simi- 
lar to that used for a Dcache ECC error: 


The load instruction’s destination register is written with incorrect data; however, 
the load queue will retain the state associated with the load instruction. 


A consumer of the load instruction’s data may be issued before the error is 
recognized; however, the Ibox will invoke a replay trap at an instruction that is 
older than (or equal to) any instruction that consumes the load instruction’s data. 
The Ibox stalls the replayed Istream in the map stage of the pipeline until the error 
is corrected. 


With a READ_ERR read type from the Mbox for the load instruction in error, the 
Cbox scrubs the block in the Dcache by evicting the block into the victim buffer 
and writing it back into the Dcache. 


C_STAT[DSTREAM_MEM_ERR] is set. 


C_ADDR contains bits [42:6] of the system memory fill address of the block that 
contains the error. 


C_SYNDROME_0[7:0] and C_LSYNDROME_1[7:0] contain the syndrome of 
quadword 0 and 1, respectively, of the octaword subblock that contains the error. 


The load queue retries the load instruction and rewrites the register. 
DC_STAT[ECC_ERR_LD] is set. 


A corrected read data (CRD) error interrupt is posted, when enabled. 


Note: Errors in speculative load instructions cause a CRD error to be posted but 


the data is not scrubbed by hardware. The PALcode cannot scrub the data 
because C_STAT is zero, and C_ADDR does not have the address of the 
block with the error. 
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8.11 Bcache Data Single-Bit Correctable ECC Error on a Probe 


The probed processor extracts the block from its Bcache, signaling a corrected read 
data (CRD) error and latching error information. The single-bit ECC detected error data 
is not corrected by the probed processor, but is forwarded to the requesting processor. 
The requesting processor then detects a related system fill error as a result of this sys- 
tem probe transaction. 


© No hardware correction is performed. 
¢ C_STAT[PROBE_BC_ERR] is set. 


¢ C_ADDR contains bit [42:6] of the Bcache address of the block that contains the 
error. 


¢ C_SYNDROME_0[7:0] and C_LSYNDROME_1[7:0] contain the syndrome of 
quadword 0 and 1, respectively, of the octaword subblock that contains the error. 


¢ A CRD error interrupt is posted, when enabled. 


¢ The PALcode on the probed processor may choose to scrub the error, though it will 
probably be scrubbed by the requesting processor. 


8.12 Double-Bit Fill Errors 

Double-bit errors for fills are detected, but not corrected, in the 21264A. The following 

events may cause a double-bit fill error: 

© Icache fill from Bcache 

e Deache fill from Bcache 

e Icache fill from memory 

¢ Deache fill from memory 

If an error is detected, the following actions are taken: 

¢ C_STAT is set to one of the following: 
ISTREAM_BC_DBL (Icache fill from Bcache) 
DSTEAM_BC_DBL (Dcache fill from Bcache) 
ISTREAM_MEM_DBL (Icache fill from memory) 
DSTREAM_MEM_DBL (Dcache fill from memory) 


¢ C_ADDR contains bits [42:6] of the system memory fill address of the block that 
contains the error. 


¢ When enabled, a machine check (MCHK) is posted. The MCHK is taken when not 
in PALmode. 


¢ A double-bit fill error from memory, marked by the data’s corresponding ECC, 
when written to cache, also writes the corresponding ECC to cache. Any consumer 
of that error (such as another CPU) also consumes the corresponding ECC value. 


Note: C_ADDR may be inaccurate in heavy traffic conditions. C_LSTAT is accu- 
rate. 
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8.13 Error Case Summary 


Table 8-3 summarizes the various error cases and their ramifications. 


Table 8-3 Error Case Summary 


Error 


Icache data or tag 
parity error 


Deache tag parity 
error (on issue) 


Dcache tag parity 
error (on retry) 


Deache single-bit 
ECC error on load 


Deache single-bit 
ECC error on 
speculative load 


Deache single-bit 
ECC error on small 
store 


Deache single-bit 
ECC error on victim 
read 


Deache second error 
on store 


Dcache duplicate tag 
parity error 


Bcache tag parity 
error 


Bcache single-bit 
error on Icache fill 


Bcache single-bit 
error on Deache fill 


Bcache victim read 
on Dcache/Bcache 
miss 


Bcache victim read 
on ECB 


Exception Status 


CRD 


DFAULT 


MCHK! 


CRD 


CRD 


CRD 


None 


MCHK! 
MCHK! 
MCHK'! 


MCHK 
and CRD2 


CRD 


CRD 


None 


ISTAT[PAR] 


MM_STAT[DC_TAG_PERR]} 


VA[address] 


DC_STAT[TPERR_PO] or 


DC_STAT[TPERR_P1] 


DC_STAT[ECC_ERR_LD] 
C_STAT[DSTREAM_DC_ERR] 


Hardware 
Action 


Icache flushed 


Corrected and 
scrubbed 


C_ADDRIbits [19:6] of the error 
address. [42:20] not updated.] 


DC_STAT[ECC_ERR_LD] 


C_STAT contains zero 


DC_STAT[ECC_ERR_ST] 


None 


DC_STAT[SEO] 


C_STAT[DC_PERR] 
C_ADDR[error address] 


C_STAT[BC_PERR] 
C_ADDRl[error address] 


C_STAT[ISTREAM_BC_ERR] 


C_ADDR[error address] 
C_SYNDROME_0O 
C_SYNDROME_1 


DC_STAT[ECC_ERR_LD] 


C_STAT[DSTREAM_BC_ERR] 


C_ADDRf[error address] 
C_SYNDROME_0 
C_SYNDROME_1 


DC_STAT[ECC_ERR_LD] 


C_STAT contains 0 


None 


None 


Corrected and 
scrubbed 


Corrected and 
scrubbed 


No correction 
on either store 
Uncorrectable 


Uncorrectable 


Icache flushed 


Corrected and 
scrubbed in 
Dcache 


None 


None 
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PALcode Action 
Log as CRD 


Evict with two 
HW_LDs and log as 
CRD 

Log as MCHK 


Log as CRD 


Log as CRD 


Log as CRD 


None 


Log as MCHK 


Log as MCHK 


Log as MCHK 


Scrub error as described 
in Section D.36. 
Log as CRD 


Scrub error as described 
in Section D.36. 
Log as CRD 


Log as CRD 


None 


Table 8-3 Error Case Summary (Continued) 


Error Case Summary 








Hardware 
Error Exception Status Action PALcode Action 
Memory single-bit MCHK C_STATTISTREAM_MEM_ERR] _Icache flushed Scrub error as described 
error on Icache fill | and CRD? C_ADDRl[error address] in Section D.36. 
C_SYNDROME_0 Log as CRD 
C_SYNDROME_1 
Memory single-bit CRD DC_STATTECC_ERR_LD] Corrected and Scruberror as described 
error on Deache fill C_STATTDSTREAM_MEM_ERR] _ scrubbed in in Section D.36. 
C_ADDRf[error address] Dcache Log as CRD 
C_SYNDROME_0 
C_SYNDROME_1 
Bcache single-bit CRD C_STAT[TPROBE_BC_ERR] None May scrub error as 
error on a probe hit C_ADDR[error address]* described in Section 
C_SYNDROME_0 D.36. 
C_SYNDROME_1 Log as CRD 
Bcache double-bit | MCHK! C_STATIISTREAM_BC DBL] None Log as MCHK 
error on Icache fill C_ADDR[error address]* 
Bcache double-bit | MCHK! C_STAT[DSTREAM_BC_DBL] None Log as MCHK 
error on Deache fill C_ADDR{error address]* 
Memory double-bit MCHK! C_STATTISTREAM_MEM DBL] None Log as MCHK 
error on Icache fill C_ADDR[error address]}* 
Memory double-bit MCHK! C_STATTDSTREAM_MEM_DBL] None Log as MCHK 


error on Deache fill 


C_ADDR{error address]* 


but is corrected by PALcode during the scrub operation. 
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Machine check taken in native mode. It is deferred while in PALmode. 
CRD error posted in case the machine check is down a speculative path. 
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For a single-bit error on a non-target quadword, the error is not corrected in hardware, 


The contents of C_ADDR may not be accurate when there is heavy cache fill traffic. 
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Electrical Data 





This chapter describes the electrical characteristics of the 21264A and its interface pins. 
The chapter contains both ac and dc electrical characteristics and power supply consid- 
erations, and is organized as follows: 


@ Electrical characteristics 
© DC characteristics 
© Power supply sequencing 


e AC characteristics 
9.1 Electrical Characteristics 
Table 9-1 lists the maximum electrical ratings for the 21264A. 


Table 9-1 Maximum Electrical Ratings 


Characteristics Ratings 

Storage temperature -55° C to +125° C (-67° F to 257° F) 

Junction temperature 0° C to 100° C (32° F to 212° F) 

Maximum dc voltage on signal pins VDD + 400 mV 

Minimum dc voltage on signal pins VSS - 400 mV 

Maximum power @ indicated VDD 

for the following frequencies: Frequency Peak Power 
600 MHz 73W @2.1V 
667 MHz 80W @2.1V 
700 MHz 85 W @2.1V 
733 MHz 88 W @2.1V 
750 MHz 90W @2.1V 

Notes: Stresses above those listed under the given maximum electrical ratings may 


cause permanent device failure. Functionality at or above these 
limits is not implied. Exposure to these limits for extended periods of time 
may affect device reliability 


Power data is preliminary and based on measurements from a limited set of 
material. 
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9.2 DC Characteristics 


This section contains the dc characteristics for the 21264A. The 21264A pins can be 
divided into 10 distinct electrical signal types. The mapping between these signal types 
and the package pins is shown in Chapter 3. Table 9—2 shows the signal types. 


Table 9-2 Signal Types 








Signal Type Description 

I_DC_POWER Supply voltage pins (VDD/PLL_VDD) 

I_DC_REF Input dc reference pin 

LDA Input differential amplifier receiver 

I_DA_CLK Input differential amplifier clock receiver 

O_OD Open-drain output driver 

O_OD_TP Open-drain driver for test pins 

O_PP Push-pull output driver 

O_PP_CLK Push-pull output clock driver 

B_DA_OD Bidirectional differential amplifier receiver — open-drain 
B_DA_PP Bidirectional differential amplifier receiver — push-pull 


Tables 9-3 through 9-12 show the dc switching characteristics of each signal type. 
Also, the following notes apply to Tables 9-3 to 9-12. 


l. 


~ 


as 


Sa 


ma 


a 


= 


The differential voltage, Vdiff, 1s the absolute difference between the differential 
input pins. 
Delta Vpjas is defined as the open-circuit differential voltage on the appropriate 


differential pairs. Test condition for these inputs are to let the input network self 
bias and measure the open circuit voltage. The test load must be 2 1M ohm. In nor- 
mal operation, these inputs are coupled with a 680-pF capacitor. 


Functional operation of the 21264A with less than all VDD and VSS pins con- 
nected is not implied. 


Please see the special supply decoupling and noise requirements for the PLL_VDD 
outlined in the 27264A PLL Specifications. 


The test load is a 5O0-ohm resistor to VDD/2. The resistor can be connected to the 
21264A pin by a 50-ohm transmission line of any length. 


DC test conditions set the minimum swing required. These dc limits set the trip 
point precision. 


Input pin capacitance values include 2.0 pF added for package capacitance. 
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Note: Current out of a 21264A pin is represented by a— symbol while a + symbol 


indicates current flowing into a 21264A pin. 


Table 9-3 VDD (I_DC_POWER) 








Parameter Symbol Description Test Conditions Minimum Maximum 
VDD Processor core supply voltage — 19V 2.1V 
Power (sleep) Processor power required (sleep) @VDD=2.1V — 19 w! 
Note 3 
PLL_VDD PLL supply voltage (Note 4) — 3.135 V 3.465 Vc 
PLL_IDD PLL supply current (running) Freq = 600 MHz — 25 mA 
1 Power measured at 37.5 MHz while running the “Ebox aliveness test.” 

Table 9-4 Input DC Reference Pin (i1_DC_REF) 

Parameter 

Symbol Description Test Conditions Minimum Maximum 
VREF DC input reference voltage — 600 mV VDD — 650 mV 
11,1 Input current VSS<sVsVDD — 150 pA 
Table 9-5 Input Differential Amplifier Receiver (I_DA) 

Parameter 

Symbol Description Test Conditions Minimum Maximum 

Vit Low-level input voltage Note 6 — VREF — 200 mV 
Vin High-level input voltage — VREF + 200 mV — 

iT! Input current VSS<VsVDD — 150 pA 

Cw Input-pin capacitance Freq=10MHz — 5.7 pF 

Note 7 

Table 9-6 Input Differential Amplifier Clock Receiver (1_DA_CLK) 

Parameter 

Symbol Description Test Conditions Minimum Maximum 

V aift Differential input voltage — 200 mv Notel — 

1A Vpras! Open-circuit differential Is +t1pA — 50 mV 

Note 2 
lI! Input current VSS<sVsVDD — 150 pA 
Cin Input-pin capacitance Freq =10 MHz — 5.0 pF 
Note 7 
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Table 9-7 Pin Type: Open-Drain Output Driver (O_OD) 





Parameter Test 

Symbol Description Conditions Minimum Maximum 

VoL Low-level output voltage Ip, = 70 mA — 400 mV 

oz | High impedance output current 0<V<VDD — 150 pA 

Cop Open-drain pin capacitance Freq=10MHz — 5.7 pF 
Note 7 


Table 9-8 Bidirectional, Differential Amplifier Receiver, Open-Drain Output Driver (B_DA_OD) 














Parameter 

Symbol Description Test Conditions Minimum Maximum 
Vin Low-level input voltage Note 6 — VREF —200 mv 
Vin High-level input voltage — VREF+200mV — 

VoL Low-level output voltage Ip, = 70mA — 400 mV 
[Il Input current VSSSV<VDD — 150 pA! 
CIN Input-pin capacitance Freq=10MHz — 5.7 pF 

Note 7 
1 Measurement taken with output driver disabled. 

Table 9-9 Pin Type: Open-Drain Driver for Test Pins (O_OD_TP) 

Parameter Test 

Symbol Description Conditions Minimum Maximum 
VoL Low-level output voltage Io, = 15 mA — 400 mV 

lIoz | High-impedance output current 0<V<VDD — 150 pA 
Cop_TP Pin capacitance Freq=10MHz — 5.2 pF 

Note 7 


Table 9-10 Bidirectional, Differential Amplifier Receiver, Push-Pull Output Driver (B_DA_PP) 











Parameter 

Symbol Description Test Conditions Minimum Maximum 

Vit Low-level input voltage _ — VREF — 200 mV 

Vin High-level input voltage — VREF + 200 mV — 

VoL Low-level output voltage Ip, = 6 mA — 400 mV 

Vou High-level output voltage Ion =-6 MA VDD -400mV — 

II! Input current VSSSV<VDD — 150 pA! 

Cin Input-pin capacitance Freq=10MHz — 6.0 pF 
Note 7 

| Measurement taken with output driver disabled. 
Compag Confidential 


9-4 ~=Electrical Data 


21264A Revision 1.1 —- Subject To Change 


Power Supply Sequencing and Avoiding Potential Failure Mechanisms 


Table 9-11 Push-Pull Output Driver (O_PP) 








Parameter Test 
Symbol Description Conditions Minimum Maximum 
VoL Low-level output voltage Io, = 40 mA — 500 mV 
Vou High-level output voltage Ip. =40mA VDD-S500mV — 
lIoz! High-impedance output current 0<V<VDD — 150 pA 
Cop Open-drain pin capacitance Freq= 10MHz — 6.0 pF 
Note 7 
Table 9-12 Push-Pull Output Clock Driver (O_PP_CLK) 
Parameter Test 
Symbol Description Conditions Minimum Maximum 
VoL Low-level output voltage Note 5 — VDD/2 — 325 mV 
Von High-level output voltage Note 5 VDD/2 + 325mV — 
loz | High-impedance output 0<V<VDD — 40 mA! 
current 


| Measured value includes current from onchip termination structures. 


9.3 Power Supply Sequencing and Avoiding Potential Failure Mech- 
anisms 


Before the power-on sequencing can occur, systems should ensure that DCOK_H is 
deasserted and Reset_L is asserted. Then, systems ramp power to the 21264A 
PLL_VDD @ 3.3 V and the 21264A power planes (VDD @ 2.0 V, not to exceed 2.1 V 
under any circumstances), with PLL_VDD leading VDD. Systems should supply 
differential clocks to the 21264A on ClkIn_H and ClkIn_L. The clocks should be 
running as power is supplied. 


When enabling the power supply inputs in a system, three failure mechanisms must be 
avoided: 


1. Bidirectional signal buses must not conflict during power-up. A conflict on these 
buses can generate high current conditions, which can compromise the reliability of 
the associated chips. 


2. Similarly, input receivers should not see intermediate voltage levels that can also 
generate high current conditions, which can compromise the reliability of the 
receiving chip. 


3. Finally, no CMOS chip should see an input voltage that is higher than its internal 
VDD. In such a condition, a reasonable level of charge can be injected into the bulk 
of the die. This condition can expose the chip to a positive-feedback latchup 
condition. 


The 21264A addresses those three failure mechanisms by disabling all of its outputs 
and bidirectional pins (with three exceptions) until the assertion of DCOK_H. The 
three exceptions are Tdo_H, EV6Clk_L, and EV6CIk_H. Tdo_H is used only in the 
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tester environment and does not need to be disabled. EV6CIk_L and EV6CIk_H are 
outputs that are both generated and consumed by the 21264A; thus, VDD tracks for 
both the producer and consumer. 


-On the push-pull interfaces: 


¢ Disabling all output drivers leaves the output signal at the DC bias point of the ter- 
mination network. 


¢ Disabling the bidirectional drivers leaves the other consumers of the bus as the bus 
master. 


On the open-drain interfaces: 


e Disabling all output drivers leaves the output signal at the voltage of the open-drain 
pull-up. 


e¢ Disabling all bidirectional drivers leaves the other consumers of the bus as the bus 
master. 


To avoid failure mechanism number two, systems must sequence and control external 
signal flow in such a way as to avoid zero differential into the 21264A input receivers 
(DA, I_DA_CLK, B_DA_OD, B_DA_PP, and B_DA_ PP). Finally, to avoid failure 
mechanism number three, systems must sequence input and bidirectional pins (I_DA, 
I_DA_CLK, B_DA_OD, B_DA_PP, and I_DC_REF) such that the 21264A does not 
see a voltage above its VDD. 


In addition, as power is being ramped, Reset_L must be asserted — this allows the 
21264A to reset internal state. Once the target voltage levels are attained, systems 
should assert DCOK_H. This indicates to the 21264A that internal logic functions can 
be evaluated correctly and that the power-up sequence should be continued. Prior to 
DCOK_H being asserted, the logic internal to the 21264A is being reset and the 
internal clock network is running (either clocked by the VCO, which is at a nominal 
speed, or by ClkIn_H, if the PLL is bypassed). 


The reset state machine is in state WAIT_SETTLE. 


9.4 AC Characteristics 


Abbreviations: 
The following abbreviations apply to Table 9-13: 
e TSU = Setup time 
¢ Duty cycle = Minimum clock duty cycle 
¢ TDH = Hold time 
e Slew rate = referenced to signal edge 


AC Test Conditions: 
The following conditions apply to the measurements that are listed in Table 9-13: 


¢ VDD is in the range between 1.9 V and 2.1 V. 
e =6SysVref is VDD/2 Volts. 
¢ =BeVref is 0.75 Volts. 
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The input voltage swing is Vref + 0.40 Volts. 


AC Characteristics 


All output skew data is based on simulation into a 50-ohm transmission line that is 
terminated with 50 ohms to VDD/2 for Bcache timing, and with 50 ohms to VDD 
for all other timing. 


Timings are measured at the pins as follows: 


For open-drain outputs, timing is measured to (V.) + Vierm)/2. Where Vierm is 
the off-chip termination voltage for system signals. 
For non-open-drain outputs, timing is measured to (V,j + Vop,)/2. 
For all inputs other than type I DA_CLK, timing is measured to the point 
where the input signal crosses VREF. 
For type I. DA_CLK inputs, timing is measured when the voltage on the com- 
plementary inputs is equal. 


Table 9-13 AC Specifications 

















Signal Name Type Reference Signal TSU' TDH? TSkew DutyCycle TSlew 
SysAddIn_L[14:0 IDA SysAddInClk_L 400 ps 400ps NA NA 1.0 V/ns 
SysFillValid_L I_LDA SysAddInCik_L 400ps 400ps NA NA 1.0 V/ns 
SysDataInValid_L IDA SysAddInCik_L 400 ps 400ps NA NA 1.0 V/ns 
SysDataQOutValid_L IDA SysAddiInCik_L 400 ps 400ps NA NA 1.0 V/ns 
SysAddInCik_L IDA NA NA NA NA 45-55% 1.0 V/ns 
SysAddOut_L[14:0] O0_OD SysAddOutCik_L NA NA +300ps?> NA NA 
SysAddOutClk_L O_OD EV6Cik_x NA NA +400ps 45-55% NA 
SysData_L[63:0] B_DA_OD SysDataInCik_H[7:0] 400 ps 400ps NA NA 1.0 V/ns 
SysDataOutClk_L[7:0}* NA NA +300ps?> NA NA 
SysCheck_L[7:0] B_DA_OD SysDataInCik_H[7:0] 400 ps 400ps NA NA 1.0 V/ns 
SysDataOutClk_L[7:0}* NA NA +300ps> NA NA 
SysDataInClk_H[7:0} LDA NA NA NA NA 45-55% 1.0 V/ns 
SysDataOutClk_L[7:0} O_OD EV6CIik_x NA NA + 400 ps 45-55% NA 
BcAdd_H[23:4] O_PP BcTagOutClk_x NA NA + 300 ps>® NA — 
BcDataOE_L O_PP BcDataOutClk_x{3:0]’ 45-55% — 
BcLoad_L O_PP 38-63%°® — 
BcDataWr_L O_PP 40-60%? -_ 
BcData_H[127:0] B_DA_PP _BcDataOutCik_x[3:0]!° NA NA +300ps® 45-55% 1.0 V/ns 
38-63% NA 
40-60%? NA 
BcDataInClk_H[7:0] 400 ps 400ps NA NA NA 
BcDataInCik_H[7:0] IDA NA NA NA NA 45-55% 
BcDataOutClik_H[3:0] O_PP EV6Clk_x NA NA + 400 ps 
BcDataOutClk_L[3:0] O_PP EV6Cik_x NA NA + 400 ps 
BceTag_H[42:20] B_DA_PP BcTagInCik_H 400 ps 400ps NA NA 1.0 V/ns 
BcTagValid_H B_DA_PP  _ BcTagOutClk_x NA NA + 300 ps° 45-55% NA 
BcTagDirty_H B_DA_PP 38-63% NA 
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Table 9-13 AC Specifications (Continued) 














Signal Name Type Reference Signal TSU’ TDH? TSkew Duty Cycle TSlew 

BcTagShared_H B_DA_PP 40-60%? NA 

BcTagParity_H B_DA_PP 

BeTagOE_L O_PP 

BcTagWr_L O_PP 

BcTagInClk_H LDA NA NA NA NA 45-55% 

BcTagOutCik_x O_PP EV6CIk_x NA NA + 400 ps 

IRQ_H[5:0} IDA DCOK_H 10ns! 10ns!! NA NA 100 mV/ns 

Reset_L!? IDA NA NA NA NA 100 mV/ns 

DCOK_H® LDA NA NA NA NA 100 mV/ns 

PilBypass_H'4 LDA NA NA NA NA 100 mV/ns 

CikIn_x!5 I_DA_CLK NA NA NA 40-60% '6 1.0 V/ns 

FrameCik_x!7 IDA_CLK ClkIn_x 400 ps 400ps NA NA 1.0 V/ns 

EV6Cik_x!8 O_PP_CLK ClkIn_x NA NA +1.0 ns YDivt5% = NA 

EV6CIk_x!? Cycle Compression Specification: See Note 19 

ClkFwdRst_H LDA FrameClk_x 400 ps 400ps NA NA 1.0 V/ns 

SromData_H LDA SromCik_H 2.0ns 2.0ns NA 100 mV/ns 

SromOE_L O_OD EV6Clk_x NA NA +2.0 ns 

SromClik_H?° O_OD EV6Cik_x NA NA + 7.0 ns 

Tms_H LDA Tck_H 2.0ns 2.0ns NA NA 100 mV/ns 

Trst_L?! LDA Tck_H NA NA NA NA 100 mV/ns 

Tdi_H LDA Tck_H 2.0ns 2.0ns NA NA 100 mV/ns 

Tdo_H O_OD Tck_H NA NA +7.0ns NA NA 

Tck_H LDA IEEE 1149.1 Port Freq. =5.0 NA NA NA 45-55% i00 mV/ns 
MHz Max. 

TestStat_H O0_OD EV6CIk_x NA NA +40ns NA NA 





f 


i 
2 
3 


The TSU specified for all clock-forwarded signal groups is with respect to the associated clock. 
The TDH specified for all clock-forwarded signal groups is with respect to the associated clock. 


The TSkew value applies only when the SYS_CLK_DELAY/{0:1] entry in the Cbox WRITE_ONCE 
chain (Table 5-24) is set to zero phases of delay between forwarded clock out and address/data. 


The TSkew specified for SysData_L signals is only with respect to the associated clock. 


These signals should be referenced to BeTagOutClk_x when measuring TSkew, provided that 
BcTagOutClkl_x and BcDataOutClk_x have no programmed offset. 


The TSkew value applies only when the BC_CLK_DELAY{0:1] entry in the Cbox WRITE_ONCE 
chain (Table 5-24) is set to zero phases of delay for Bcache clock. 


The TSkew specified for BcAdd_H signals is only with respect to the associated clock. 
The duty cycle for 2.5X single data mode 2 GCLK phases high and 3 GCLK phases low. 
The duty cycle for 3.5X single data mode 3 GCLK phases high and 4 GCLK phases low. 
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10 The TSkew specified for BcData_H signals is only with respect to the associated clock pair. 


1! ¥ROQ_HI5:0] must have their TSU and TDH times referenced to DCOK_H during power-up to ensure 
the correct Y divider and resulting EV6CIk_x duty cycle. When the 21264A is executing instructions 
IRQ_H[5:0] act as normal asynchronous pins to handle interrupts. 


12 Reset_L is an asynchronous pin. It may be asserted asynchronously. 
13 DCOK_H is an asynchronous pin. Note the minimum slew rate on the assertion edge. 


‘ PUBypass_H may not switch when ClkIn_x is running. This pin must either be deasserted during 
power-up or the 21264A core power pin (VDD pins) indicating the 21264A’s internal PLL will be 
used. Note that it is illegal to use PllBypass_H asserted during power-up unless a CikIn_x is 
present. 


15 See Section 7.11.2 for a discussion of ClkIn_x as it relates to operating the 21264A’s internal PLL 


versus running the 21264A in PLL bypass mode. ClkIn_x has specific input jitter requirements to 
ensure optimum performance of the internal 21264A PLL. 

In PLL bypass mode, duty cycle deviation from 50%-50% directly degrades device operating fre- 
quency. 

The TSU and TDH of FrameClk_x are referenced to the deasserting edge of ClkIn_x. 

This signal is a feedback to the internal PLL and may be monitored for overall 21264A jitter. It can 
also be used as a feedback signal to an external PLL when in PLL bypass mode. Proper termination of 
EV6Clk_x is imperative. 

The cycle or phase cannot be more than 5% shorter than the nominal. Do not confuse this measure- 
ment with duty cycle. Refer to Appendix F in the 27264A PLL Specification for the system measure- 
ment procedure. 

The period for SromClk_H is 256 GCLK cycles. 


When Trst_L is deasserted, Tms_H must not change state. Trst_L is asserted asynchronously but 
may be deasserted synchronously. 


2 
2 


—_— © 
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Thermal Management 





This chapter describes the 21264A thermal management and thermal design 
considerations, and is organized as follows: 


© Operating temperature 


e Heat sink specifications 


® Thermal design considerations 


10.1 Operating Temperature 


The 21264A is specified to operate when the temperature at the center of the heat sink 
(T,) is as shown in Table 10-1. Temperature T, should be measured at the center of the 
heat sink, between the two package studs. The GRAFOIL pad is the interface material 
between the package and the heat sink. 


Table 10-1 Operating Temperature at Heat Sink Center (T,) 


Te 

80.2°C 
78.1°C 
76.9° C 
76.0° C 
75.4° C 


Note: 


Frequency 
600 MHz 
667 MHz 
700 MHz 
733 MHz 
750 MHz 


Compaq recommends using the heat sink because it greatly improves the 
ambient temperature requirement. 


Table 10—2 lists the values for the center of heat-sink-to-ambient (8,a) for the 21264A 
587-pin PGA. Tables 10-3 through 10-6 show the allowable T, (without exceeding T,) 
at various airflows. 


Table 10-2 6,a at Various Airflows for 21264A 


Airflow (linear ft/min) 


0,a with heat sink type 1 (°C/W) 
8,a with heat sink type 2 (°C/W) 
6,a with heat sink type 3! (°C/W) 


100 200 400 800 1000 
2.0 1.2 0.65 0.40 0.37 
1.4 0.78 0.45 0.33 0.31 
— 0.38 — 
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' Heat sink type 3 has a 80 mm x 80 mm x 15 mm fan attached. 


Table 10-3 Maximum T, for 21264A @ 600 MHz and @ 2.0 V with Various Airflows 








Airflow (linear ft/min) 100 200 400 800 1000 
Maximum T, with heat sink type 1 (°C) — — 37.3 53.8 55.8 
Maximum T, with heat sink type 2 (°C) — 28.7 50.5 58.4 59.7 
Maximum T, with heat sink type 3! (°C) — 55.1— 


! Heat sink type 3 has a 80 mm x 80 mm x 15 mm fan attached. 


Table 10-4 Maximum T, for 21264A @ 667 MHz and @ 2.0 V with Various Airflows 








Airflow (linear ft/min) 100 200 400 800 1000 
Maximum T, with heat sink type 1 (°C) — — 30.7 48.9 $1.1 
Maximum T, with heat sink type 2 (°C) — 21.2 45.3 54.0 55.5 
Maximum T, with heat sink type 3! (°C) — 50.4 — 


! Heat sink type 3 has a 80 mm x 80 mm x 15 mm fan attached. 


Table 10-5 Maximum T, for 21264A @ 700 MHz and @ 2.0 V with Various Airflows 


Airflow (linear ft/min) 100 200 400 800 1000 
Maximum T, with heat sink type 1 (°C) — — 26.9 46.1 48.4 
Maximum T, with heat sink type 2 (°C) — — 42.2 51.5 53.0 
Maximum T, with heat sink type 3! (°C) — 47.6 — 


| Heat sink type 3 has a 80 mm x 80 mm x 15 mm fan attached. 


Table 10-6 Maximum T, for 21264A @ 733 MHz and @ 2.0 V with Various Airflows 








Airflow (linear ft/min) 100 200 400 800 1000 
Maximum T, with heat sink type 1 (°C) — — 24.0 44.0 46.4 
Maximum T, with heat sink type 2 (°C) — — 40.0 49.6 51.2 
Maximum T, with heat sink type 3! (°C) — 45.6 — 





| Heat sink type 3 has a 80 mm x 80 mm x 15 mm fan attached. 


Table 10—7 Maximum T, for 21264A @ 750 MHz and @ 2.0 V with Various Airflows 








Airflow (linear ft/min) 100 200 400 800 1000 
Maximum T, with heat sink type 1 (°C) — — 22.1 42.6 45.1 

Maximum T, with heat sink type 2 (°C) — — 38.5 48.4 50.0 

Maximum T, with heat sink type 3! °C) — 44.3 — 





1 Heat sink type 3 has a 80 mm x 80 mm x 15 mm fan attached. 
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(3.17 in) 


Three heat sink types are specified. The mounting holes for all three are in line with the 
80.5 mm 


cooling fins. 
Figure 10—1 shows the heat sink type 1, along with its approximate dimensions. 


Figure 10-1 Type 1 Heat Sink 


10.2 Heat Sink Specifications 


32.5 mm 
(1.280 in) 
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Figure 10—2 shows the heat sink type 2, along with its approximate dimensions. 


Figure 10-2 Type 2 Heat Sink 


81.0 mm 
8) 


fi 


81.0 mm 
(3.19 in) 





4 


44.5mm 
(1.75) 


FM-06120.Al4 
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(3.15 in) 
71.5mm 
(2.815 in) 


The cooling fins of heat sink type 3 are cross-cut. Also, an 80 mm x 80 mm x 15 mm 
80.0 mm 


Figure 10-3 shows heat sink type 3, along with its approximate dimensions. 
fan is attached to heat sink type 3. 


Figure 10-3 Type 3 Heat Sink 


FM-06121.Al4 
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10.3 Thermal Design Considerations 


Follow these guidelines for printed circuit board (PCB) component placement: 


e Orient the 21264A on the PCB with the heat sink fins aligned with the airflow 
direction. 


e Avoid preheating ambient air. Place the 21264A on the PCB so that inlet air is not 
preheated by any other PCB components. 


¢ Do not place other high power devices in the vicinity of the 21264A. 


Do not restrict the airflow across the 21264A heat sink. Placement of other devices 
must allow for maximum system airflow in order to maximize the performance of the 
heat sink. 
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Testability and Diagnostics 





This chapter describes the 21264A user-oriented testability and diagnostic features. 
These features include automatic power-up self-test, Icache initialization from external 
serial ROMs, and the serial diagnostic terminal port. 


The boundary-scan register, which is another testability and diagnostic feature, is listed 
in Appendix B. The boundary-scan register is compatible with IEEE Standard 1149.1. 


This chapter is organized as follows: 

e Test pins 

e SROM/serial diagnostic terminal port 

e JEEE 1149.1 Port 

e TestStat_H Pin 

¢ Power-up self-test and initialization 

e Notes on IEEE 1149.1 operation and compliance 


The 21264A has several manufacturing test features that are used only by the factory, 
and they are beyond the scope of this chapter. 


11.1 Test Pins 


The 21264A test access ports include the IEEE 1149.1 test access port, a dual-purpose 
SROM/Serial diagnostic terminal port, and a test status output pin. Table 11-1 lists the 
test access port pins. 


Table 11-1 Dedicated Test Port Pins 





Pin Name Type Function 

Tms_H Input TEEE 1149.1 test mode select 

Tdi_H Input IEEE 1149.1 test data in 

Trst_L Input TEEE 1149.1 test logic reset 

Tck_H Input IEEE 1149.1 test clock 

Tdo_H Output TEEE 1149.1 test data output 

SromData_H Input SROM data/Diagnostic terminal data input 
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Table 11-1 Dedicated Test Port Pins (Continued) 








Pin Name Type Function 

SromClik_H Output SROM clock/Diagnostic terminal data output 
SromOE_L Output SROM enable/Diagnostic terminal enable 
TestStat_H Output BiST status/timeout output 


11.2 SROM/Serial Diagnostic Terminal Port 


This port supports two functions. During power-up, it supports automatic initialization 
of the Cbox configuration registers and the Icache from the system serial ROMs. After 
power-up, it supports a serial diagnostic terminal. 


11.2.1 SROM Load Operation 


The following actions are performed while the SROM is loaded: 


¢ The SromOE_L pin supplies the output enable as well as the reset to the serial 
ROM. (Refer to the serial ROM specifications for details.) The 21264A asserts this 
signal low for the duration of the Icache load from the serial ROM. When the load 
has been completed, the signal remains deasserted. 


The SromClik_H pin supplies the clock to the SROM that causes it to advance to 
the next bit. Simultaneously, it causes the existing data on the SromData_H pin to 
be shifted into an internal shift register. The cycle time of this clock is 256 times the 
CPU clock rate. (If the FASTROM flag is set, the rate is 16 times the CPU clock 
rate.) The hold time on SromData_H is 2* CPU cycle time with respect to 
SromClk_H. 


® The SromData_H pin reads data from the SROM. 


Every data and tag bit in Icache is loaded by that sequence. 


11.2.2 Serial Terminal Port 


After the SROM data is loaded into the Icache, the three SROM interface signals can be 
used as a software UART and the pins become parallel I/O pins that can drive a system 
debug or diagnostic terminal by using an interface such as RS422. 


The serial line interface is automatically enabled if the SromOE_L pin is wired to the 
following pins: 


e An active high enable RS422 (or 26LS32) driver, driving to SromData_H 
e An active high enable RS422 (or 26LS31) receiver, driven from SromCik_H 


After reset, SromClk_H is driven from the Ibox I_CTL[SL_XMIT]. This register is 
cleared during reset, so it starts driving as a 0, but it can be written by software. The 
data becomes available at the pin after the HW_MTPR instruction that wrote 
I_CTL[SL_XMIT] is retired. 
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On the receive side, while in native mode, any transition on the Ibox ILCTL 
[SL_RCV], driven from the SromData_H pin, results in a trap to the PALcode inter- 
rupt handler. When in PALmode, all interrupts are blocked. The interrupt routine then 
begins sampling I_CTL [SL_RCV] under a software timing loop to input as much data 
as needed, using the chosen serial line protocol. 


11.3 IEEE 1149.1 Port 


The IEEE 1149.1 Test Access Port consists of the Tdi_H, Tdo_H, Tms_H, Tck_H, 
and Trst_L pins. These pins access the IEEE 1149.1 mandated public test features as 
well as several private chip manufacturing test features. 


The port meets all requirements of the standard except that there are no pull-ups on the 
Tdi_H, Tms_H, and Trst_L pins, as required by the present standard. 


The scope of 1149.1 compliant features on the 21264A is limited to the board level 
assembly verification test. The systems that do not intend to drive this port must termi- 
nate the port pins as follows: pull-ups on Tdi_H and Tms_H, pull-downs on Tck_H 
and Trst_L. 


The port logic consists of the usual standard compliant components, namely, the TAP 
Controller State Machine, the Instruction Register, and the Bypass Register. 


The Bypass Register provides a short shift path through the chip’s IEEE 1149.1 logic. It 
is generally useful at the board level testing. It consists of a 1-bit shift register. 


The Instruction Register holds test instructions. On the 21264A, this register 1s 5 bits 
wide. Table 11—2 describes the supported instructions. The instruction set supports sev- 
eral public and private instructions. The public instructions operate and produce behav- 
ior compliant with the standard. The private instructions are used for chip 
manufacturing test and must not be used outside of chip manufacturing. 


Table 11-2 IEEE 1149.1 Instructions and Opcodes 


Opcode Instruction Operation/Function 

OOxxx Private These instructions are for factory test use only. The user must 
O1xxx not load them as they may have a harmful effect on the 
10xxx 21264A. 

11000 SAMPLE TEEE 1149.1 SAMPLE instruction. 

11001 HIGHZ IEEE 1149.1 HIGHZ instruction. 

11010 CLAMP TEEE 1149.1 CLAMP instruction. 

11011 EXTEST IEEE 1149.1 EXTEST instruction. 

11100 Private These instructions are for factory test use only. The user must 
11101 not load them as they may have a harmful effect on the 
11110 21264A. 

11111 BYPASS IEEE 1149.1 BYPASS instruction. 


Figure 11-3 shows the TAP controller state machine state diagram. The signal Tms_H 
controls the state transitions that occur with the rising clock edge. TAP state machine 
States are decoded and used for initiating various actions for testing. 
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TestStat_H Pin 


Table 11-3 TAP Controller State Machine 
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11.4 TestStat_H Pin 


The TestStat_H pin serves two purposes. During power-up, it indicates BiST pass/fail 
status. After power-up, it indicates the 21264A timeout event. 


The system reset forces TestStat_H to low. Tbox forces it high during the internal BiST 
and array initialization operations. During result extraction (DoResult state), the Tbox 
drives it low for 16 cycles. After that, the pin remains low if the BiST has passes, other- 
wise, it is asserted high and remains high until chip is reset again. Figure 11-1 pictori- 
ally shows the behavior of the pin during the power-up operations. 


Note: A system designer may sample the TestStat_H pin on the first rising edge 
of the SromClk_H pin to determine BiST results. After the power-up dur- 
ing the normal chip operation, whenever the 21264A does not retire an 
instruction for 2K CPU cycles, the pin is asserted high for 3 CPU cycles. 
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Figure 11-1 TestStat_H Pin Timing During Power-Up Built-In Self-Test (BiST) 
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Figure 11-2 TestStat_H Pin Timing During Buiit-In Self-Initialization (BiS!) 


Tbox_Rst_A_Li { pee a ee ye en ee ee ee 
TBox Reset Engine1 Idle X__ DomigSerinn Si ide 


TestStatus HY fF eee 
CikFwdRst_Li “Nfs a ee ee 


LKG- 1095 1A-SB8WF 


11.5 Power-Up Self-Test and Initialization 


Upon powering up, the 21264A automatically performs the self-test of all major 
embedded RAM arrays and then loads the Cbox configuration registers and the instruc- 
tion cache from the system SROM. The chip’s internal logic is held in reset during 
these operations. See Chapter 9 for sequencing of power-up operations. 


11.5.1 Built-in Self-Test 


The power-up self-test is performed on the instruction cache and tag arrays, the data 
cache and tag arrays, the triplicate tag arrays, and the various RAM arrays located in the 
branch history table logic. The power-up self-test lasts for approximately 700,000 CPU 
cycles. The result of self-test is made available as Pass/Fail status on the TestStat_H 
pin (see Section 11.4). 


The result of self-test is also available in an IPR bit. Software can read this status 
through IPR I_CTL(23) (0 = pass, 1 = fail). See Section 5.2.15. 


The power-up BiST leaves all bits in all arrays initialized to zeroes. The instruction 
cache and the tag are reinitialized as part of the SROM initialization step. This is 
detailed in Section 11.5.2. 


11.5.2 SROM Initialization 


Power-up initialization on the 21264A is different from previous generation Alpha sys- 
tems in two aspects. First, in the 21264A systems, the presence of serial ROMs is 
mandatory as initialization of several Cbox configuration registers depends on them. 
Second, it is possible to skip or partially fill Icache from serial ROMs. Figure 11-3 
shows the map of the data in serial ROMs. 


Compaq Confidential 
21264A Revision 1.1 - Subject To Change Testability and Diagnostics 11-5 


Power-Up Self-Test and Initialization 


In the SROM represented in Figure 11-3, the length for fields Cbox Config 
Data(0,n) plus MBZ(m,0) must equal 367 bits. (If Cbox Config Data(0,n) is 
(0,366), MBZ would be zero.) 


For the 21264A, Cbox Config Data is 304 bits; the value for n is 303. 
Therefore, the value MBZ field for Pass 3 is: 
MBZ(m,0) = 367 minus 304 = 63 = (62,0) 
Tables 11-4 and 5—24 describe the details of the Icache and Cbox bit fields, respec- 
tively. Note that the fetch count is a multiple of four. 


Figure 11-3 SROM Content Map 


fetch [0](0,192) |fetch[j-1}(0,192)|fetch[j](0,192)|fetch_count(11,0))}Cbox Config Data(O, n) | MBZ(m,0) 


(first block) (last block) 





11.5.2.1 Serial Instruction Cache Load Operation 


All Icache bits, including each block’s tag, address space number (ASN), address space 
match (ASM), and valid and branch history bits are loaded serially from offchip serial 
ROMs. Once the serial load has been invoked by the chip reset sequence, the cache is 
loaded from the lower to the higher addresses. 


The serial Icache fill invoked by the chip reset sequence operates internally at a fre- 
quency of 





256 


Table 11-4 lists the Icache bit fields in an SROM line. Fetch bits are listed in the order 
of shift direction (to down and to right). In Table 11-4: 


Bit Type Meaning 

Cc Disp_add carry 

i Instruction 

iq Iqueue predecodes 
tr Trouble bits 

dv Destination valid 
ea Ea_src 

par-MBZ Must be zero 


The load occurs at the rate of 1 bit per 256 CPU cycles. The chip outputs a 50% duty 
cycle clock on the SromClk_H pin. 


The serial ROMs can contain enough Alpha code to complete the configuration of the 
external interface (for example, set the timing on the external cache RAMs, and diag- 
nose the path between the CPU chip and the real ROM). 
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The instruction cache lines are loaded in the reverse order. If the fetch_count(9,0) is 
zero, then, no instruction cache lines are loaded. Since the valid bits are already cleared 
by the BiST operation, the first instruction fetch is missed in the instruction cache and 


the chip seeks instructions from the offchip memory. 


Table 11-4 Icache Bit Fields in an SROM Line 








Fetch Bit Icache Data Fetch Bit Icache Data Fetch Bit Icache Data 

0 par-MBZ 86 par-MBZ 172 Ip_train 

1 c[3] 87 c[0] 173:175 —Ip_sre(2:0) 

2:27 i[3](25,20,24,19,23,18,22,17 88:113 i{0](25,20,24,19,3,18,22,17, 176:181  Ip_idx(14:9) 
521,16:0) 21,16:0) 

28 c[2] 114 c[1]} 182:186  p_idx(8:4) 

29:42 i[2](25,20,24,19, 115:128 — if] (25,20,24,19, 187 Ip_idx(15) 
23,18,22,17,21,16:12) 23,18,22,17,21,16:12) 

43 parity 129 parity 188:192  Ip_ssp[4:0] 

44:55 i[2](11:0) 130:141 i{1}(11:0) — — 

56 dv[3] 142 dv[0] as as 

57:59 iq(3](2:0) 143:145 — iq[0}(2:0) — — 

60:65 i[3}(26:31) 146:151 —i{0)(26:31) — — 

66,68 ea[3](2:0) 152:154 — ea[0}(2:0) — — 

69 dv[2] 155 dv[1] = = 

70,72 iq[2](2:0) 156:158 — iq{[1](2:0) — — 

73:78 i[2](26:31) 159:164 — if1](26:31) — — 

79:81 ea[2](2:0) 165:167 eal 1](2:0) — — 

82:85 tr(7:4) 168:171 tr(0:3) — — 





Refer to the Alpha Motherboards Software Developer’s Kit (SDK) for example C code 
that calculates the predecode values of a serial Icache load. 


11.6 Notes on IEEE 1149.1 Operation and Compliance 


1. IEEE 1149.1 port pins on the 21264A are not pulled up or pulled down on the chip. 
The necessary pull-up or pull-down function must be implemented on the board. 


2. Tms_H should not change when Trst_L is being deasserted. 


References 


IEEE Std. 1149.1-1993 A Test Access Port and Boundary Scan Architecture. 


See Appendix B for a listing of the Boundary-Scan Register. 
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A 


Alpha Instruction Set 





This appendix provides a summary of the Alpha instruction set and describes the 
21264A IEEE floating-point conformance. It is organized as follows: 


e =©Alpha instruction summary 

© Reserved opcodes 

e IEEE floating-point instructions 

e VAX floating-point instructions 

e Independent floating-point instructions 
© Opcode summary 

e Required PALcode function codes 


e IEEE floating-point conformance 


A.1 Alpha Instruction Summary 


This section contains a summary of all Alpha architecture instructions. All values are in 
hexadecimal radix. Table A—1 describes the contents of the Format and Opcode col- 
umns that are in Table A—2. 


Table A-1 Instruction Format and Opcode Notation 


Format Opcode 
Instruction Format Symbol Notation Meaning 
Branch Bra 00 oo is the 6-bit opcode field. 
Floating-point F-P 00. fff oo is the 6-bit opcode field. 
fffis the 11-bit function code field. 
Memory Mem 00 oo is the 6-bit opcode field. 
Memory/function code Mfc oo. ffff oo is the 6-bit opcode field. 
fff is the 16-bit function code in the displacement 
field. 
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Table A-1 Instruction Format and Opcode Notation (Continued) 





Instruction Format 


Memory/ branch 


Operate 


PALcode 


Format 
Symbol 


Mbr 


Opr 


Pcd 


Opcode 
Notation Meaning 
oo.h oo is the 6-bit opcode field. 
his the high-order 2 bits of the displacement field. 
00. ff oo is the 6-bit opcode field. 
ff is the 7-bit function code field. 
00 oo is the 6-bit opcode field; the particular PAL- 


code instruction is specified in the 26-bit function 
code field. 


Qualifiers for operate instructions are shown in Table A-2. Qualifiers for IEEE and 
VAX floating-point instructions are shown in Tables A—5 and A-6, respectively. 


Table A-2 Architecture Instructions 


Mnemonic 
ADDF 
ADDG 
ADDL 
ADDL/V 
ADDQ 
ADDQ/V 
ADDS 
ADDT 
AMASK 
AND 
BEQ 
BGE 
BGT 
BIC 
BIS 
BLBC 
BLBS 
BLE 
BLT 
BNE 


BR 


Format 
F-P 
F-P 
Opr 
Opr 
Opr 
Opr 
F-P 
F-P 
Opr 
Opr 
Bra 
Bra 
Bra 
Opr 
Opr 
Bra 
Bra 
Bra 
Bra 
Bra 
Bra 


A-2_ Alpha Instruction Set 


Opcode Description 

15.080 Add F_floating 

15.0A0 Add G_floating 

10.00 Add longword 

10.40 Add longword with integer overflow enable 

10.20 Add quadword 

10.60 Add quadword with integer overflow enable 

16.080 Add S_floating 

16.0A0 Add T_floating 

11.61 Architecture mask 

11.00 Logical product 

39 Branch if = zero 

3E Branch if 2 zero 

3F Branch if > zero 

11.08 Bit clear 

11.20 Logical sum 

38 Branch if low bit clear 

3C Branch if low bit set 

3B Branch if <zero 

3A Branch if < zero 

3D Branch if # zero 

30 Unconditional branch 
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Table A-2 Architecture Instructions (Continued) 








Mnemonic Format Opcode Description 

BSR Mbr 34 Branch to subroutine 

CALL_PAL Pcd 00 Trap to PALcode 

CMOVEQ Opr 11.24 CMOVE if = zero 

CMOVGE Opr 11.46 CMOVE if 2 zero 

CMOVGT Opr 11.66 CMOVE if > zero 

CMOVLBC Opr 11.16 CMOVE if low bit clear 

CMOVLBS Opr 11.14 CMOVE if low bit set 

CMOVLE Opr 11.64 CMOVE if < zero 

CMOVLT Opr 11.44 CMOVE if < zero 

CMOVNE Opr 11.26 CMOVE if # zero 

CMPBGE Opr 10.0F |= Compare byte 

CMPEQ Opr 10.20 Compare signed quadword equal 

CMPGEQ F-P 15.0A5 Compare G_floating equal 

CMPGLE F-P 15.0A7 Compare G_floating less than or equal 

CMPGLT F-P 15.0A6 Compare G_floating less than 

CMPLE Opr 10.6D Compare signed quadword less than or equal 

CMPLT Opr 10.4D Compare signed quadword less than 

CMPTEQ F-P 16.0A5 Compare T_floating equal 

CMPTLE F-P 16.0A7 Compare T_floating less than or equal 

CMPTLT F-P 16.0A6 Compare T_floating less than 

CMPTUN F-P 16.0A4 Compare T_floating unordered 

CMPULE Opr 10.3D Compare unsigned quadword less than or equal 

CMPULT Opr 10.1D Compare unsigned quadword less than 

CPYS F-P 17.020 Copy sign 

CPYSE F-P 17.022 Copy sign and exponent 

CPYSN F-P 17.021 Copy sign negate 

CTLZ Opr 1C.32 Count leading zero 

CTPOP Opr 1¢.30 Count population 

CTTZ Opr 1€.33 Count trailing zero 

CVTDG F-P 15.09E Convert D_floating to G_floating 

CVTGD F-P 15.0AD Convert G_floating to D_floating 

CVTGF F-P 15.0AC Convert G_floating to F_floating 
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Alpha Instruction Summary 


Table A-—2 Architecture Instructions (Continued) 


Mnemonic Format Opcode Description 
CVTGQ F-P 15.0AF Convert G_floating to quadword 
CVTLQ F-P 17.010 Convert longword to quadword 
CVTQF F-P 15.0BC Convert quadword to F_floating 
CVTQG F-P 15.0BE Convert quadword to G_floating 
CVTQL F-P 17.030 Convert quadword to longword 
CVTQS F-P 16.0BC Convert quadword to S_floating 
CVTQT F-P 16.0BE Convert quadword to T_floating 
CVTST F-P 16.2AC Convert S_floating to T_floating 
CVTTQ F-P 16.0AF Convert T_floating to quadword 
CVTTS F-P 16.0AC Convert T_floating to S_floating 
DIVF F-P 15.083 Divide F_floating 
DIVG F-P 15.0A3 Divide G_floating 
DIVS F-P 16.083 Divide S_floating 
DIVT F-P 16.0A3 Divide T_floating 
ECB Mfc 18.E800 Evict cache block 
EQV Opr 11.48 Logical equivalence 
EXCB Mfc 18.0400 Exception barrier 
EXTBL Opr 12.06 Extract byte low 
EXTLH Opr 12.6A Extract longword high 
EXTLL Opr 12.26 Extract longword low 
EXTQH Opr 12.7A Extract quadword high 
EXTQL Opr 12.36 Extract quadword low 
EXTWH Opr 12.5A Extract word high 
EXTWL Opr 12.16 Extract word low 
FBEQ Bra 31 Floating branch if = zero 
FBGE Bra 36 Floating branch if 2 zero 
FBGT Bra 34 Floating branch if > zero 
FBLE Bra 33 Floating branch if < zero 
FBLT Bra 32 Floating branch if < zero 
FBNE Bra 35 Floating branch if # zero 
FCMOVEQ F-P 17.02A FCMOVE if =zero 
FCMOVGE F-P 17.02D FCMOVE if 2 zero 
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Table A-2 Architecture Instructions (Continued) 


Mnemonic 
FCMOVGT 
FCMOVLE 
FCMOVLT 
FCMOVNE 
FETCH 
FETCH_M 
FTOIS 
FTOIT 
IMPLVER 
INSBL 
INSLH 
INSLL 
INSQH 
INSQL 
INSWH 
INSWL 
ITOFF 
ITOFS 
ITOFT 
JMP 

JSR 


JSR_COROUTINE 


LDA 
LDAH 
LDBU 
LDF 
LDG 
LDL 
LDL_L 
LDQ 
LDQ_L 
LDQ_U 


21264A Revision 1.1 — Subject To Change 


Format 
F-P 
F-P 
F-P 
F-P 
Mfc 
Mfc 
F-P 
F-P 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
F-P 
F-P 
F-P 
Mbr 
Mbr 
Mbr 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 


Opcode Description 
17.02F FCMOVE if > zero 
17.02E FCMOVE if <zero 
17.02C FCMOVE if < zero 
17.02B FCMOVE if zero 
18.8000 Prefetch data 
18.A000 Prefetch data, modify intent 
1C.78 Floating to integer move, S_floating 
1C.70 Floating to integer move, T_floating 
11.6C Implementation version 
12.0B _Insert byte low 
12.67 Insert longword high 
12.2B _Insert longword low 
12.77 Insert quadword high 
12.3B Insert quadword low 
12.57 Insert word high 
12.1B —_ Insert word low 
14.014 Integer to floating move, F_floating 
14.004 Integer to floating move, S_floating 
14.024 Integer to floating move, T_floating 
1A.0 Jump 
1A.1 Jump to subroutine 
1A.3 Jump to subroutine return 
08 Load address 
09 Load address high 
OA Load zero-extended byte 
20 Load F_floating 
21 Load G_floating 
28 Load sign-extended longword 
2A Load sign-extended longword locked 
29 Load quadword 
2B Load quadword locked 
OB Load unaligned quad word 
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Table A-2 Architecture Instructions (Continued) 


Mnemonic 
LDS 

LDT 
LDWU 
MAXSB8 
MAXSW4 
MAXUB8 
MAXUW4 
MB 
MF_FPCR 
MINSB8 
MINS W4 
MINUB8 
MINUW4 
MSKBL 
MSKLH 
MSKLL 
MSKQH 
MSKQL 
MSKWH 
MSKWL 
MT_FPCR 
MULF 
MULG 
MULL 
MULL/V 
MULQ 
MULQ/V 
MULS 
MULT 
ORNOT 
PERR 
PKLB 


Format 
Mem 
Mem 
Mem 
Opr 
Opr 
Opr 
Opr 
Mfc 
F-P 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
F-P 
F-P 
F-P 
Opr 
Opr 
Opr 
Opr 
F-P 
F-P 
Opr 
Opr 
Opr 


A-6 Alpha Instruction Set 


Opcode 
22 

23 

OC 
1C.3E 
1C.3F 
1€.3C 
1C.3D 
18.4000 
17.025 
1C.38 
10.39 
1C.3A 
1C.3B 
12.02 
12.62 
12.22 
12.72 
12.32 
12.52 
12.12 
17.024 
15.082 
15.0A2 
13.00 
13.40 
13.20 
13.60 
16.082 
16.0A2 
11.28 
1€.31 
1C.3/ 


Description 

Load S_floating 

Load T_floating 

Load zero-extended word 
Vector signed byte maximum 
Vector signed word maximum 
Vector unsigned byte maximum 
Vector unsigned word maximum 
Memory barrier 

Move from FPCR 

Vector signed byte minimum 
Vector signed word minimum 
Vector unsigned byte minimum 
Vector unsigned word minimum 
Mask byte low 

Mask longword high 

Mask longword low 

Mask quadword high 

Mask quadword low 

Mask word high 

Mask word low 

Move to FPCR 

Multiply F_floating 

Multiply G_floating 

Multiply longword 

Multiply longword with integer overflow enable 
Multiply quadword 

Multiply quadword with integer overflow enable 
Multiply S_floating 

Multiply T_floating 

Logical sum with complement 
Pixel error 


Pack longwords to bytes 
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Table A-2 Architecture Instructions (Continued) 





Mnemonic 
PKWB 
RC 

RET 
RPCC 
RS 
S4ADDL 
S4ADDQ 
S4SUBL 
S4SUBQ 
S8ADDL 
S8ADDQ 
S8SUBL 
S8SUBQ 
SEXTB 
SEXTW 
SLL 
SQRTF 
SQRTG 
SQRTS 
SQRTT 
SRA 
SRL 

STB 

STF 

STG 
STL 
STL_C 
STQ 
STQ_C 
STQ_U 
STS 

STT 
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Format 
Opr 
Mfc 
Mbr 
Mfc 
Mfc 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
F-P 
F-P 


Opcode 
1C.36 
18.E000 
1A.2 


Description 
Pack words to bytes 
Read and clear 


Return from subroutine 


18.C000 Read process cycle counter 


18.F000 
10.02 
10.22 
10.0B 
10.2B 
10.12 
10.32 
10.1B 
10.3B 
1C.00 
1C.01 
12.39 
14.08A 
14.0AA 
14.08B 
14.0AB 
12.3C 
12.34 
OE 

24 

25 

2C 

2E 

2D 

2F 

OF 

26 

27 


Read and set 

Scaled add longword by 4 
Scaled add quadword by 4 
Scaled subtract longword by 4 
Scaled subtract quadword by 4 
Scaled add longword by 8 
Scaled add quadword by 8 
Scaled subtract longword by 8 
Scaled subtract quadword by 8 
Sign extend byte 

Sign extend word 

Shift left logical 

Square root F_floating 

Square root G_floating 
Square root S_floating 

Square root T_floating 

Shift right arithmetic 

Shift right logical 

Store byte 

Store F_floating 

Store G_floating 

Store longword 

Store longword conditional 
Store quadword 

Store quadword conditional 
Store unaligned quadword 
Store S_floating 

Store T_floating 
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Table A-2 Architecture Instructions (Continued) 
a 


Mnemonic Format Opcode Description 

STW Mem OD Store word 

SUBF F-P 15.081 Subtract F_floating 

SUBG F-P 15.0A1 Subtract G_floating 

SUBL Opr 10.09 Subtract longword 

SUBL/V Opr 10.49 Subtract longword with integer overflow enable 
SUBQ Opr 10.29 Subtract quadword 

SUBQ/V Opr 10.69 Subtract quadword with integer overflow enable 
SUBS F-P 16.081 Subtract S_floating 

SUBT F-P 16.0A1 Subtract T_floating 

TRAPB Mfc 18.0000 Trap barrier 

UMULH Opr 13.30 Unsigned multiply quadword high 

UNPKBL Opr 10.35. Unpack bytes to longwords 

UNPKBW Opr 1C.34 Unpack bytes to words 

WH64 Mfc 18.F800 Write hint — 64 bytes 

WMB Mfc 18.4400 Write memory barrier 

XOR Opr 11.40 Logical difference 

ZAP Opr 12.30 Zero bytes 

ZAPNOT Opr 12.31 Zero bytes not 


A.2 Reserved Opcodes 


This section describes the opcodes that are reserved in the Alpha architecture. They can 
be reserved for Compaq or for PALcode. 


A.2.1 Opcodes Reserved for Compaq 
Table A-3 lists opcodes reserved for Compaq. 
Table A-3 Opcodes Reserved for Compaq 


Mnemonic Opcode Mnemonic Opcode 
OPCO1 01 OPCOS 05 
OPC02 02 OPC06 06 
OPCO03 03 OPC07 07 
OPC04 04 ~— — 
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A.2.2 Opcodes Reserved for PALcode 
Table A’-4 lists the 21264A-specific instructions. See Chapter 2 for more information. 


Table A-4 Opcodes Reserved for PALcode 








21264A Architecture 

Mnemonic Opcode Mnemonic Function 

HW_LD 1B PALIB Performs Dstream load instructions. 

HW_ST 1F PALIF Performs Dstream store instructions. 

HW_REI 1E PALIE Returns instruction flow to the program counter (PC) pointed 
to by EXC_ADDR internal processor register (IPR). 

HW_MFPR 19 PALI9 Accesses the Ibox, Mbox, and Dcache IPRs. 

HW_MTPR 1D PALID Accesses the Ibox, Mbox, and Dcache IPRs. 


A.3 IEEE Floating-Point Instructions 


Table A—S5 lists the hexadecimal value of the 11-bit function code field for the IEEE 
floating-point instructions, with and without qualifiers. The opcode for these 
instructions is 16j,. 


Table A-5 IEEE Floating-Point Instruction Function Codes 


Mnemonic None Ic /M /D J /uc /UM /UD 
ADDS 080 000 040 0CO 180 100 140 1CO 
ADDT 0A0 020 060 OEO 1A0 120 160 1E0 
CMPTEQ 0A5 — — — — — — — 
CMPTLT 0A6 — — — — —_ — — 
CMPTLE 0A7 — — — — — — — 
CMPTUN OA4 — —_ — aon ihe rane 2 
CvTas OBC 03C 07C OFC — — — — 
CVTQT OBE 03E O7E OFE — — — — 

See — — — — _ —_ — 
CVTST below 

See — — — — — — — 
CVTTQ below 
CVTTS OAC 02C 06C OEC 1AC 12C 16C 1EC 
DIVS 083 003 043 0C3 183 103 143 1C3 
DIVT 0A3 023 063 0E3 1A3 123 163 1E3 
MULS 082 002 042 0C2 182 102 142 1C2 
MULT 0A2 022 062 OE2 1A2 122 162 1E2 
SQRTS 08B 00B 04B OCB 18B 10B 14B 1CB 
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Table A-5 IEEE Floating-Point Instruction Function Codes (Continued) 


esses hese sarsrernensasureenaentnnasnnneescnes 











SQRTT OAB 02B 06B OEB 1AB 12B 16B 1EB 
SUBS 081 001 041 0C1 181 101 141 1C1 
SUBT 0OA1 021 061 OE] 1Al1 ‘121 161 1E1 
Mnemonic ISU Isuc ISUM = /SUD fsul /SUIC /SUIM /SUID 
ADDS 580 500 540 5CO 780 700 740 7C0 
ADDT 5A0 520 560 SEO 7A0 720 760 TEO 
CMPTEQ SA5 

CMPTLT 5A6 

CMPTLE SA7 

CMPTUN 5A4 

CVTQS TBC 73C 77C 7FC 
CVTQT TBE 73E TTE 7FE 
CVTTS SAC 52C 56C SEC TAC 72C 76C TEC 
DIVS 583 503 543 5C3 783 703 743 7C3 
DIVT 5A3 523 563 SE3 7A3 723 763 TE3 
MULS 582 502 542 5C2 782 702 742 12 
MULT 5A2 522 562 SE2 TA2 722 762 7E2 
SQRTS 58B 50B 54B 5CB 78B 70B 74B 7CB 
SQRTT SAB 52B 56B SEB 7AB 72B 76B 7EB 
SUBS 581 501 541 5SC1 781 701 741 7C1 
SUBT SAI 52] 561 SEI TAI 721 761 7E1 
Mnemonic None Is 

CVTST 2AC 6AC 

Mnemonic None Ic IN Nc ISV SVC ISVI isvic 
CVTTQ OAF 02F 1AF 12F SAF $2F TAF 72F 
Mnemonic D ND /SVD /SVID /M VM ISVM /SVIM 
CVTTQ OEF 1EF SEF 7EF 06F 16F 56F 76F 
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Programming Note: 


In order to use CMPTxx with software completion trap handling, it is necessary to 
specify the /SU IEEE trap mode, even though an underflow trap is not possible. In order 
to use CVTQS or CVTQT with software completion trap handling, it is necessary to 
specify the /SUI IEEE trap mode, even though an underflow trap is not possible. 


A.4 VAX Floating-Point Instructions 


Table A-6 lists the hexadecimal value of the 11-bit function code field for the VAX 
floating-point instructions. The opcode for these instructions is 15j¢. 


Table A-6 VAX Floating-Point Instruction Function Codes 








Mnemonic None Ic 1U /Uuc IS Isc /SU #SUC 
ADDF 080 000 180 100 480 400 580 500 

ADDG 0AO 020 1A0 120 4A0 420 5A0 520 

CMPGEQ OAS 4A5 

CMPGLE 0A7 4A7 

CMPGLT OA6 4A6 

CVTDG 09E O1E 19E 1E 49E 41E 59E 51E 

CVTGD OAD 02D 1AD 12D 4AD 42D SAD 52D 
CVTGF OAC 02C 1AC 12C 4AC 42C SAC 52C 

CVTGQ See below 

CVTQF OBC 03C 

CVTQG OBE 03E 

DIVF 083 003 183 103 483 403 583 503 
DIVG 0A3 023 1A3 123 4A3 423 5A3 523 
MULF 082 002 182 102 482 402 582 502 
MULG OA2 022 1A2 122 4A2 422 5A2 522 
SQRTF O8A 00A 18A 10A 48A 40A 58A 50A 
SQRTG OAA 02A 1AA 12A 4AA 42A SAA 52A 
SUBF 081 001 181 101 481 401 581 501 
SUBG OA1 021 1A1 121 4Al 421 5A1 521 

Mnemonic None Ic IV Ive IS isc ISV ‘Svc 
CVTGQ OAF 02F 1AF 12F 4AF 42F SAF 52F 


A.5 Independent Floating-Point Instructions 


Table A-7 lists the hexadecimal value of the 11-bit function code field for the floating- 
point instructions that are not directly tied to IEEE or VAX floating point. The opcode 
for the following instructions is 176. 
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Table A~7 Independent Floating-Point Instruction Function Codes 








Mnemonic None N ISV 
CPYS 020 = as 
CPYSE 022 oe = 
CPYSN 021 = ses 
CVTLQ 010 _ a 
CVTQL 030 130 530 
FCMOVEQ 02A _ — 
FCMOVGE 02D — a 
FCMOVGT 02F mm iz 
FCMOVLE 02E ac = 
FCMOVLT 02C == cae 
MF_FPCR 025 — et 
MT_FPCR 024 a _ 


A.6 Opcode Summary 


Table A-8 lists all Alpha opcodes from 00 (CALL_PAL) through 3F (BGT). In the 
table, the column headings that appear over the instructions have a granularity of 8 j¢. 
The rows beneath the Offset column supply the individual hexadecimal number to 
resolve that granularity. 


If an instruction column has a 0 in the night (low) hexadecimal digit, replace that 0 with 
the number to the left of the backslash (\) in the Offset column on the instruction’s row. 
If an instruction column has an 8 in the right (low) hexadecimal digit, replace that 8 
with the number to the right of the backslash in the Offset column. 


For example, the third row (2/A) under the 10,¢ column contains the symbol INTS*, 
representing the all-integer shift instructions. The opcode for those instructions would 
then be 12)¢ because the 0 in 10 is replaced by the 2 in the Offset column. Likewise, the 
third row under the 18,¢ column contains the symbol JSR*, representing all jump 
instructions. The opcode for those instructions is 1A because the 8 in the heading is 
replaced by the number to the right of the backslash in the Offset column. The 
instruction format is listed under the instruction symbol. 


Table A-8 Opcode Summary 








Offset 00 08 10 18 20 28 30 38 

0/8 PAL* LDA INTA* MISC* LDF LDL BR BLBC 

(pal) (mem) (op) (mem) (mem) (mem) (br) (br) 

1/9 Res LDAH INTL* \PAL\ LDG LDQ FBEQ BEQ 

(mem) (op) (mem) (mem) (br) (br) 

2/A LDBU_ Res INTS* JSR* LDS LDL_L  FBLT BLT 

(op) (mem) (mem) (mem) (br) (br) 
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Required PALcode Function Codes 





Offset 00 08 10 18 20 28 30 38 
3/B Res LDQ_U INTM* \PAL\ LDT LDQ_L FBLE BLE 
(mem) (op) (mem) (mem) (br) (br) 
4/c LDWU Res ITFP* FPTI* STF STL BSR BLBS 
(mem) (mem) (br) (br) 
5/D Res STW FLTV* \PAL\ STG STQ FBNE BNE 
(op) (mem) (mem) (br) (br) 
6/E Res STB FLTI* \PAL\ STS STL_C  FBGE BGE 
(op) (mem) (mem) — (br) (br) 
TIF Res STQ_U FLTL* \PAL\ STT STQ_C  FBGT BGT 
(mem) (op) (mem) (mem) (br) (br) 
Table A—9 explains the symbols used in Table A-8. 
Table A-9 Key to Opcode Summary Used in Table A-8 
Symbol Meaning 
FLTI* IEEE floating-point instruction opcodes 
FLTL* Floating-point operate instruction opcodes 
FLTV* VAX floating-point instruction opcodes 
FPTI* Floating-point to integer register move opcodes 
INTA* Integer arithmetic instruction opcodes 
INTL* Integer logical instruction opcodes 
INTM* Integer multiply instruction opcodes 
INTS* Integer shift instruction opcodes 
ITFP* Integer to floating-point register move opcodes 
JSR* Jump instruction opcodes 
MISC* Miscellaneous instruction opcodes 
PAL* PALcode instruction (CALL_PAL) opcodes 
\PAL\ Reserved for PALcode 
Res Reserved for Compaq 


A.7 Required PALcode Function Codes 


Table 22 lists opcodes required for all Alpha implementations. The notation used is 
oo.ffff, where oo is the hexadecimal 6-bit opcode and fff is the hexadecimal 26-bit 
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function code. 


Table A-10 Required PALcode Function Codes 


Mnemonic Type Function Code 
DRAINA Privileged 00.0002 
HALT Privileged 00.0000 
IMB Unprivileged 00.0086 
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A.8 IEEE Floating-Point Conformance 


The 21264A supports the IEEE floating-point operations defined in the Alpha System 
Reference Manual, Revision 7 and therefore also from the Alpha Architecture Hand- 
book, Version 4. Support for a complete implementation of the IEEE Standard for 
Binary Floating-Point Arithmetic (ANSI/IEEE Standard 754 1985) is provided by a 
combination of hardware and software. The 21264A provides several hardware features 
to facilitate complete support of the IEEE standard. 


The 21264A provides the following hardware features to facilitate complete support of 
the IEEE standard: 


The 21264A implements precise exception handling in hardware, as denoted by the 
AMASK instruction returning bit 9 set. TRAPB instructions are treated as NOPs 
and are not issued. 


The 21264A accepts both Signaling and Quiet NaNs as input operands and propa- 
gates them as specified by the Alpha architecture. In addition, the 21264A delivers 
a canonical Quiet NaN when an operation is required to produce a NaN value and 
none of its inputs are NaNs. Encodings for Signaling NaN and Quiet NaN are 
defined by the Alpha Architecture Handbook, Version 4. 


The 21264A accepts infinity operands and implements infinity arithmetic as 
defined by the IEEE standard and the Alpha Architecture Handbook, Version 4. 


The 21264A implements SQRT for single (SQRTS) and double (SQRTT) precision 
in hardware. 


Note: In addition, the 21264A also implements the VAX SQRTF and SQRTG 


instructions. 


The 21264A implements the FPCR[DNZ] bit. When FPCR[DNZ] is set, denormal 
input operand traps can be avoided for arithmetic operations that include the /S 
qualifier. When FPCR[DNZ] is clear, denormal input operands for arithmetic oper- 
ations produce an unmaskable denormal trap. CPYSE/CPYSN, FCMOV«xx, and 
MF_FPCR/MT_FPCR are not arithmetic operations, and pass denormal values 
without initiating arithmetic traps. 


The 21264A implements the following disable bits in the floating-point control reg- 
ister (FPCR): 

— Underflow disable (UNFD) 

— Overflow disable (OVFD) 

— Inexact result disable (INED) 

— Division by zero disable (DZED) 

— Invalid operation disable (INVD) 


If one of these bits is set, and an instruction with the /S qualifier set generates the 
associated exception, the 21264A produces the IEEE nontrapping result and sup- 
presses the trap. These nontrapping responses include correctly signed 

infinity, largest finite number, and Quiet NaNs as specified by the IEEE 
standard. 
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The 21264A does not produce a denormal result for the underflow exception. 
Instead, a true zero (+0) is written to the destination register. In the 21264A, the 
FPCR underflow to zero (UNDZ) bit must be set if the underflow disable CUNFD) 
bit is set. If desired, trapping on underflow can be enabled by the instruction and the 
FPCR, and software may compute the denormal value as defined in the IEEE stan- 
dard. 


The 21264A records floating-point exception information in two places: 


° The FPCR status bits record the occurrence of all exceptions that are detected, 
whether or not the corresponding trap is enabled. The status bits are cleared only 
through an explicit clear command (MT_FPCR); hence, the exception information 
they record is a summary of all exceptions that have occurred since the last time 
they were cleared. 


¢ If an exception is detected and the corresponding trap is enabled by the instruction, 
and is not disabled by the FPCR control bits, the 21264A will record the 
condition in the EXC_SUM register and initiate an arithmetic trap. 


The following items apply to Table A—11: 


e The 21264A traps on a denormal input operand for all arithmetic operations unless 
FPCR[DNZ] = 1. 


e Input operand traps take precedence over arithmetic result traps. 
e The following abbreviations are used: 

Inf: Infinity 

QNaN: Quiet NaN 

SNaN: Signalling NaN 

CQNaN: Canonical Quiet NaN 


For IEEE instructions with /S, Table A—11 lists all exceptional input and output 
conditions recognized by the 21264A, along with the result and exception gener- 
ated for each condition. 


Table A-11 Exceptional input and Output Conditions 


21264A Hardware 
Alpha Instructions Supplied Result Exception 
ADDx SUBx INPUT 
Inf operand +Inf (none) 
QNaN operand QNaN (none) 
SNaN operand QNaN Invalid Op 
Effective subtract of two Inf operands CQNaN Invalid Op 
ADDx SUBx OUTPUT 
Exponent overflow +Inf or +MAX Overflow 
Exponent underflow +0 Underflow 
Inexact result Result Inexact 
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Table A-11 Exceptional Input and Output Conditions (Continued) 


























21264A Hardware 
Alpha Instructions Supplied Result Exception 
MULx INPUT 
Inf operand +Inf (none) 
QNaN operand QNaN (none) 
SNaN operand QNaN Invalid Op 
0 * Inf CQNaN Invalid Op 
| MULx OUTPUT (same as ADDx) 
DIVx INPUT 
QNaN operand QNaN (none) 
SNaN operand QNaN Invalid Op 
0/0 or Inf/Inf CQNaN Invalid Op 
A/0 (A not 0) +Inf Div Zero 
A/Inf +0 (none) 
Inf/A +Inf (none) 
DIVx OUTPUT (same as ADDx) 
SQRTx INPUT 
+Inf operand +Inf (none) 
QNaN operand QNaN (none) 
SNaN operand QNaN Invalid Op 
-A (A not 0) CQNaN Invalid Op 
-0 -0 (none) 
SQRTx OUTPUT . 
Inexact result root Inexact 
CMPTEQ CMPTUN INPUT 
Inf operand True or False (none) 
QNaN operand False for EQ, True for UN (none) 
SNaN operand False for EQ,True for UN Invalid Op 
CMPTLT CMPTLE INPUT 
Inf operand True or False (none) 
QNaN operand False Invalid Op 
SNaN operand False Invalid Op 
CVTfi INPUT 
Inf operand 0 Invalid Op 
QNaN operand 0 Invalid Op 
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Table A-11 Exceptional Input and Output Conditions (Continued) 





21264A Hardware 
Alpha Instructions Supplied Result Exception 
SNaN operand 0 Invalid Op 
CVTfi OUTPUT 
Inexact result Result Inexact 
Integer overflow Truncated result Invalid Op 
CVTif OUTPUT 
Inexact result Result Inexact 
CVTff INPUT 
Inf operand +Inf (none) 
QNaN operand QNaN (none) 
SNaN operand QNaN Invalid Op 


CVTff OUTPUT (same as ADDx) 


FBEQ FBNE FBLT FBLE FBGT FBGE 
LDS LDT 

STS STT 

CPYS CPYSN 

FCMOVx 


See Section 2.14 for information about the floating-point contro] register (FPCR). 
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B 


21264A Boundary-Scan Register 





This appendix contains the BSDL description of the 21264A boundary-scan register. 


B.1 Boundary-Scan Register 


The Boundary-Scan Register (BSR) on the 21264A is 367 bits long. It is accessed by 
the three public (SAMPLE, EXTEST, CLAMP) instructions. The register operation for 
the public instructions is compliant with the IEEE 1149.1 standard. 


The boundary-scan register covers all input, output, and bidirectional pins with the 
exception of the compliance enable pins and pins that are power-supply-type or analog 
in nature. The BSDL for the boundary-scan register is given in Section B.1.1. 


B.1.1 BSDL Description of the Alpha 21264A Boundary-Scan Register 


-- alpha2i264a.bsdl 

-~The BSDL Description for EV6’s IEEE 1149.1 Circuits 
-- Revision History 

-~Rev Date Description 

-- 1.0 Feb 99 First external release 


entity Alpha_21264a is-- (ref B.8) 


generic (PHYSICAL_PIN_MAP :string := "PGA_EV6");-- (ref B.8.2) 
port (-- (ref B.8.3) 

TestStat_H :out bit ; 
SromOE_L sout bit H 
SromC1k_H :out bit ; 
SromData_H tin bit ; 
Reset_L sin bit ; 

IRQ_H :in bit_vector (0 to 5) ; 

DcOk_H :linkage bit ; -- Compliance enable input 
NoConnect_0 :linkage bit i o-- née 
NoConnect_1 :linkage bit 3; -- n/e 
P1lBypass_H slinkage bit ; 
FrameClk_H :linkage bit i 
FrameClk_L :linkage bit 7 
ClkFwdRst_H :in bit ; 
BcCheck_H :inout bit_vector (0 to 15); 
BeData_H :inout bit_vector (0 to 127); 
SysData_L :inout bit_vector (0 to 63) ; 
SysCheck_L :inout bit vector (0 to 7) ; 
BcDataInClk_H :in bit_vector (0 to 7) ; 
SysDataOutClk_L :out bit_vector (0 to 7) ; 
Spare_7 :linkage bit_vector (0 to 7) ; 
SysDataInClk_H sin bit_vector (0 to 7) ; 
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BeDataOutClk_L :out bit_vector (0 to 3) ; -- JWB corrected 
BcDataOutClk_H :out bit_vector (0 to 3) ; -- JWB corrected 
ClkIn_H :linkage bit ; -- Oscillator 
ClkIn_L :linkage bit j -- Oscillator 
PLL_VDD :linkage bit ; 
EV6Cl1k_H :linkage bit i 
EV6C1k_L :linkage bit ; 
Spare_4 :linkage bit ; 
Spare_5S :linkage bit ; 
BeTag_H :inout bit_vector (20 to 42); 
BeVref :linkage bit : 
BceTagInClk_H :in bit ; -- Name in model: 
BcTagClkIn_H 
BcTagParity_H :inout bit : 
BceTagShared_H :inout bit ; 
BcTagDirty_H tinout bit ; 
BcTagValid_H sinout bit Hi 
BcTagOutClk_L :out bit : 
BcTagOutClk_H rout bit : 
BcTagOE_L rout bit ; 
BcTagwr_L :out bit : 
BeDataWr_L sout bit 5 
BcLoad_L sout bit ; 
BcDataCE_L rout bit ; 
BcAdd_H sout bit_vector (4 to 23) ; 
SysAddoOut_L :out bit_vector (0 to 14) ; 
SysAddin_L sin bit_vector (0 to 14) ; 
SysAddInClk_L tin bit : 
SysAddoutC1lk_L sout bit ; --JWB added 
SysVref slinkage bit ; --JWB added 
SysFillvalid_L sin bit : 
SysDataInValid_L :in bit ; 
SysDataOutValid_L :in bit 7 
Spare_0 slinkage bit 7; 7~- née 
Miscvref :linkage bit joc 
Spare_2 :linkage bit 37 -- n/e 
Tdi_H sin bit ; 
Tdo_H :out bit ; 
Trst_L :in bit : 
Tck_H zin bit : 
Tms_H :in bit ; 
VSS:linkage bit_vector (0 to 103); 
VDD :linkage bit vector (0 to 93) ); 
use STD_1149_1_.1994.all ;-- (ref B.8.4) 


attribute COMPONENT_CONFORMANCE of Alpha_21264: entity is "STD_1149_1_ 1993"; 
attribute PIN_MAP of Alpha_21264 : entity is PHYSICAL_PIN_MAP ; 


constant PGA_EV6 : PIN_MAP STRING := " " & 
“SysAddIn_L : (BD30, BC29, AY28, BE29, AW27, BA27, BD28, BE27, "& 
- AY26, BC25, BB24, AV24, BD24, BE23, AW23), "& 
"SysAddIncClk_L : BB26, "& 
"SysVref : BA25, "& 
“SysFillvValid_L : BC23, "& 
“SysAddOut_L : (AW33, BE39, BD36, BC35, BA33, AY32, BE35, AV30, "& 
. BB32, BA31, BE33, AW29, BC31, AV28, BB30), "& 
"SysAddOutClk_L : BD34, “& 
"“SysData_L : (F14 , Gi3 , F12 , H12 , H10 , G7 , F6' , KB , "& 
* M6 , N7 , P6 , T8 , V8 , V6 , WI , YO , "& 
" AB8 , AC7 , AD8 , AES , AH6 , AH8 , AJ7 , ALS , "& 
. AP8 , AR7 , AT8 , AV6 , AV10, AW1l1, AV12, AW13, "& 
. F32 , F34 , H34 , G35 , F40 , G39 , K38 , J41 , "& 
" M40 , N39 , P40 , 738 , V40 , W41 , W39 , Y40 , "& 
AB38, AC39, AD38, AF40, AH38, AJ39, AL41, AK38, "& 
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: AN39, AP38, AR39, AT38, AY38, AV36, AW35, AV34),“*& 


"SysCheck_L : (L7 , AAS , AK8 , BA13, L39 , AA41, AM40, AY34),“& 
"SysDataInClk_H : (D8 , P4 , AF6 , AY6 , E37 , R43 , AG41, AV40), "& 

“SysDataOutClk_L : (G11 , U7 , AG7 , AY8 , H36 , R41, AH40, AW39), "& 
"SysDataInValid_L : BD22, ar 
"SysDataOutValid_L : BB22, “% 
"BcAdd_H : (B28 , E27 , A29 , G27 , C29 , F28 , B30 , D30 ,k 
" C31 , H28 , G29 , A33 , E31, D32 , B34 , A35 4k 
" B36 , H30 , C35 , E33 ), “% 
“BcDataOE_L S° AZT ; =k 
"BcLoad_L : F26 , “& 
"BceDataWwr_L : D26 , “& 
"BcData_H : (B10 , D1O0 , AS , CS , C3 , E3 , H6 , El 0 *& 
. J3 , K2 , L3 , M2 , T2 , UL , V2 , Y¥4 ak 


. ACl , AD2 , AE3 , AG1 , AK2 , AL3 , AR1 , AP2 ,*& 
i AY2 , BB2 , AWS , BB4 , BB8 , BES , BB1O, BET ,*& 
. G33, C37 , B40 , C41 , C43 , B43 , G41 , F444 ,"k& 
. K44 , N41 , M44 , P42 , U43 , V44 , Y42 , AB44 ,& 
> AD42, AE43, AF42, AJ45, AK42, AN45, AP44, AN41 ,"& 
ee AW45, AU41, AY44, BA43, BC43, BD42, BB38, BE41 ,"& 
: Cll, A7 , C9 , B6 , B4a , D4 , GS , D2 a & 
- H4 , Gl ,NS5S , L1 , Nl , U3 , WS , WI ok 
. AB2 , AC3 , AD4 , AF4 , AJ3 , AK4 , ANI , AM4 ,“k& 
: AUS , BAl , BA3 , BC3 , BD6 , BAI , BCI , AY12 ,"& 
' A39 , D36 , A41l , B42 , D42 , D44 , H40 , H42 ,"& 
oe G45 , L43 , L45 , N45 , T44 , V45 , W45 , AASB ,"& 
. AC43, AD44, AE41, AG45, AK44, AL43, AM42, AR45 ,"é 
* AP40, BA45, AV42, BB44, BB42, BC41, BA37, BD40),"& 


"BcCheck_H : (F2 , AB4 , AT2 , BC11, M38 , AB42, AU43, BC3? ,*& 
" M8 , AA3 , AW1l , BDIO, E45 , AC45, AT44, BB35),"& 
"BcDataInClk_H : (E7 » R3 , AH2 , BCS , F38 , U39 , AH44, AY4D),"& 
"Spare_7 : (FB , T4 =, AJ1, BD4 , E39 , V38 , AJ43, BA39),"& 
"BcDataOutClk_L : (K4 , Av4 , K42 , AT42), "& 
"BcDataOutClk_H : (95 , AU3 , J43 , AR43), "& 
"BcTag_H : (E13, H16 , All , B12 , D14, E15 , Al3 , Gi? ,"& 
. C1i5 , H18 , DI6 , B16 , C17 , A17 , E19 , B18 ,*& 
” A1l9 , F20 , D20 , E21 , C21 , D22 , H22 }, "& 
"BcoTagValid_H : B24 , "& 
"BeTagDirty_H P2234 "& 
"BcTagShared_H : G23 , "& 
"BcoTagParity_H : B22 , "& 
"BcTagOE_L : #H24 , "& 
"BcTagwr_L : E25 , "& 
“BcTagiInClk_H : G19, "& 
"BceVref : *F18 , "& 
; "BeTagOutClk_L +4 D24 , " be 
"BcTagOutClk_H : C25 , "& 
"TRO_H : (BA1S, BE13, AwWl7, AV18 BC15, BB16), "& 
"Reset_L : BbD16, "& 
“SromData_H : BCl7, "& 
"SromCLK_H : AW19, "& 
"SromOE_L : BE17, "& 
"Tms_H : BD18, "& 
"Tcok_H : BE19, "& 
"Trst_L : AY20, "& 
"Tdi_H : BA21, "& 
"Tdo_H : BB20, " & 
"TestStat_H : BAIS, "& 
*"ClkIn_H : AM8 , "& 
"ClkIn_L : AN7 , "& 
"FrameClk_H : Avil6, "& 
"FrameClk_L : AW15, “&& 
"Pl1Bypass_H : BDi2, "& 
“NoConnect_0 : BB14, "& 
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"NoConnect_1l : BD2 , "& 
"CikFwdRst_H : BEil1, "& 
"EV6C1k_H : <AM6 , "& 
"EV6C1kK_L : AL7 , "& 
"Spare_4 : AT4 , "& 
"Spare_5 : AR3 , "& 
"PLL_VDD : AV8 , "& 
"Spare_0 : BC21, "& 
"MiscVref : AV22, "& 
"Spare_2 : BES , "& 
"“DCOK_H : AY18, "& 
"VSS: (Cl , W3 , ARS , G9 , E17 , G25 , C33 , AA39, “& 
* BA41, R45 , Jl , AG3 , BAS , AWS , BA17, AW25, "& 
* BC33, AE39, A43 , AA45, R1 , AN3 , C7 , C19, "& 
BE25, E35 , AL39, G43 , AE4S5, AA] , AW3 , J7 , "& 


” Ell , BC19, C27 , BA35, AU39, N43 , AL45, AE1, "& 
. BE3 , R7 , BA11, A21 , BC27, A37 , BC39, W43 , "&> 
: AU45, AL1 , ES , AAT , C13 , G21 , E29 , G37, "& 
‘ E41 , AG43, BC45, AU1 , LS, AE7 , BC13, AW21, "& 
. BA29, AW37, L41 , AN43, BC1l , US , AU7 , AlS , "& 
" BE21, A31 , BE37, U41 , AW43, A3  , ACS , AW7 , "& 
* G15 , E23 , G31 , C39 , AC4l, BE43, G3 , AJS , "& 
bs BC7 , AY14, BA23, AW31, J39 , AJ41, C45 , N3 , "& 
7 ANS , A9 , BEI15, A25 , BE31, R39 , AR41, J45, "& 
_ E9 , R5 , AGS , BAT , D38 , T42 , AG39, AW41) ,"& 
“VDD : (B2 , V4 , AP6 , D12 , B20 , H26 , BD32, AM38, "“& 
* BB40, Y44 , H2 , AH4 , AT6 , BBi2, H20 , AV26, "& 
* D34 , AV38, F42 , AF44, P2  , AP4 , BB6 , B14, "& 
: AV20, BD26, BB34, BD38, M42 , AM44, Y2 , AY4, "& 
i B8 , H14 , BD20, D28 , F36 , D40 , v42 , AV44, "& 
: AF2 , D6 , P8 , AV14, F22 , BB28, AY36, K40 , "& 
: AH42, BD44, AM2 , K6 , Y8 , BD14, AY22, F30 , "& 
* B38 , T40 , AP42, AV2 , T6 , AF8 , F16 , A23 , "& 
* AY30, H38 , AB40, AY42, AB6 , BD8 , AY16, F24 , "& 
" B32 , P38 , AD40, B44 , F4 , AD6, FIO , DIB, "& 
: AY24, H32 , Y38 , AK40, H44 , M4 , AK6 , AY10, “& 


“ BB18, B26 , AV32, AF38, AT40, P44 ) nue 
constant numeric_EV6 : PIN_MAP_STRING := " " & 

"SysAddiIn_L : (559 , 536 , 468 , 580 , 445 , 490 , 558 , 579 , "& 
ts 467 , 534 , 511 , 421 , 556 , 577 , 443 ), "& 
"SysAddInclk_L : 512 , "& 
"Sysvref >: 489 , "& 
"SysFillvalid_L : 533, "& 
"SysAddOut_L : (448 , 585 , 562 , 539 , 493 , 470 , 583 , 424 , "& 
- 515 , 492 , 582 , 446 , 537 , 423 , 514 ), "& 
“SyvsAddoOutClk_L : 561, "& 
"SysData_L : (118 , 140 , 117 , 161 , 160 , 137 , 114 , 189 , "& 
bs 204 , 213 , 220 , 237 , 253 , 252 , 261 , 268 , "“& 
bs 285 , 293 , 301 , 308 , 332 , 333, 341 , 356, "& 
2 381 , 389 , 397 , 412 , 414 , 437 , 415 , 438 , "& 


. 127, 128 , 172 , 151, 131 , 153 , 190 , 183 , "& 
. 207 , 214 , 223, 238 , 255 , 263 , 262 , 271 , "& 
” 286 , 294 , 302 , 319 , 334 , 342 , 359 , 350 , “"& 


" 374 , 382 , 390 , 398 , 473 , 427 , 449 , 426 ),"& 
*SysCheck_L : (197 , 276 , 349 , 483 , 198 , 279 , 367 , 471 ),"& 
"SysDataInClk_H : (70 , 219 , 316 , 457 , 107 , 232 , 327 , 429 ),"& 
"SysDataOutClk_L : (139 , 245 , 325 , 458 , 173 , 231 , 335 , 451 ),"& 
"SysDataInValid_L : 555, "& 
"SysDataOutValid_L : 510, "& 
"BcAdd_H : (35 , 102 , 14 , 147 , 58 , 125, 36 , 81, "& 
. 59 , 169 , 148 , 16 , 104 , 82 , 38 , 17, "& 
. 39 , 170 , 61 +, 105), "& 
"BcDataOE_L : 13 =, "& 
"BcLoad_L : 124 , "& 
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"“BceDataWr_L : #79 , "& 
"“BcData_H : (26 , 71 =, 2 , 46 , 45 , 90 , 159 , 89 , "& 
* 179 , 186 , 195 , 202 , 234 , 242 , 250, 267 , "& 


. 290 , 298 , 307 , 322 , 346 , 355 , 386 , 378, "& 
. 455 , 500 , 434 , 501 , 503 , 568 , 504 , 569 , "& 
‘ 150 , 62 , 41 , 64 , 65 , 110, 154 , 133 , “& 


: 193, 215 , 209 , 224 , 248 , 257 , 272 , 289 , "& 
: 304 , 312 , 320 , 345 , 352 , 377 , 385 , 375 , *& 
7 454 , 407 , 476 , 498 , 543 , 565 , 518 , 586, “& 
7 49 ,3 , 48 , 24 , 23 , 68 , 136, 67, "& 
. 158 , 134 , 212 , 194 , 210 , 243 , 260 , 258 , °& 
i 282 , 291 , 299 , 315 , 339 , 347 , 370 , 363 , "& 


: 404 , 477 , 478 , 523 , 547 , 481 , 526 , 460, "& 
‘ 19 , 84 , 20 , 42 , 87 , 88 , 175 , 176, "& 
“ 156 , 200 , 201 , 217 , 241 , 249 , 265 , 280 , “& 
i 296 , 305 , 311 , 329 , 353 , 360 , 368 , 393 , “& 
. 383 , 499 , 430 , 521 , 520 , 542 , 495 , 564 ),"& 


“BoCheck_H : (112 , 283 , 394 , 527 , 206 , 288 , 408 , 540 , "& 
205 , 275 , 432 , 549 , 111 , 297 , 401 , 517 ),"& 
“BcDataInClk_H >: (92 , 227 , 330 , 524 , 130 , 246 , 337 , 474 ),"& 
"Spare_7 : (115 , 235 , 338 , 546 , 108 , 254 , 344 , 496 },"& 
“BcDataOutClk_L : (187 , 411 , 192 , 400 ), "& 
"BcDataOutClk_H : (180 , 403 , 184 , 392 ), "& 
"“BcTag_H : (95 , 163 , 5 , 27 , 73 , 96 , 6 , 142 , *& 
: 51 , 164 , 74 , 29 , 52 , 8 , 98 , 30 , "& 
ne 9 , 121 , 76 , 99 , S&4 , 77 +, 166 ), “& 
"BeoTagValid_H 0 33° 4, “& 
"BcTagDirty_H 3550 y "& 
"BcTagShared_H : 145 , "& 
"BceTagParity_H o. 232. 2G "& 
“BcTagOE_L : 167 , "& 
“BceTagwr_L : 101, "& 
"“BcTagInClk_H : 143 , "& 
"BeVref : 120 , "& 
"“BeTagOutClk_L : 78 , "& 
*“BoTagOutClk_H : 56 , “& 
"IRQO_H : (484 , 572 , 440 , 418 , 529 , 507 ), "& 
"Reset_L :) “852>; “& 
"SromData_H : 530 , "& 
“SromCLK_H : 441 , "& 
"SromOE_L : 574 , "& 
“Tms_H 2. 2553°, "& 
"“Tcok_H = 6 S754 "& 
"Trst_L : 464 , "& 
“Tdi_H : 487 , "& 
“Tdo_H : 509 , *& 
"TestStat_H : 486 , "& 
"ClkIn_H : 365 , "& 
*ClkIn_L : 373 , "& 
“FrameClk_H : 417 , "& 
"FrameClk_L : 439 , "& 
“PllBypass_H : 550, "& 
“NoConnect_0O : 506, "& 
“NoConnect_1 : $45 , "& 
“ClkFwdRst_H : 571 , “& 
“EV6C1k_H >: 364 , "& 
“EV6C1k_L - 9357; "& 
“Spare_4 : 395 , "& 
"“Spare_5 : 387 , "& 
"PLL_VDD : 413 , "& 
"“Spare_0 : $32 , "& 
“MiscVref : 420 , "& 
“Spare_2 : S70 , "“& 
“DCOK_H : 463 , "& 
“VSS : (44 , 259 , 388 , 138 , 97 , 146, 60 , 278 , "& 
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" 497 , 233, 178 , 323 , 479 , 436 , 485 , 444, "& 
. 538 , 310,21 , 281 , 226 , 371 , 47 + 53 4 %& 
. 578 4.106, 358 ; 155 5. 313... 274 , 433 , 181 . "s 
. 94 , 531 , 57 , 494 , 406 , 216 , 361 , 306, "& 
. 567 , 229 , 482, 10 , 535 , 18 , 541 , 264, "& 
. 409 , 354 , 91 , 277, 50 , 144 , 103 , 152, "& 
* 109 , 328 , 544 , 402 , 196 , 309 , 528 , 442 , "& 


. 491 , 450 , 199 , 376 , 522 , 244 , 405 , 7 1 "& 
. 576 , 15 , 584 , 247 , 453 , 1 , 292 , 435 , "& 
, 141 , 100 , 149 , 63 , 295 , 587 , 135 , 340, "& 
. 525 , 461 , 488 , 447 , 182 , 343 , 66 , 211, "& 
. 372 , 4 . 573, 12 , 581 , 230 , 391 , 185 ,"& 
. 93 , 228 , 324 , 480 , 85 , 240 , 326 , 452 ),"& 
“VDD >: (22 , 251 , 380, 72 , 31 , 168 , 560 , 366, "& 
* 519 , 273. ,, 157 , 331 , 396 , 505 5 165 },. 422 , "& 
. 83. , 428 , 132 , 321 , 218 , 379 , 502 , 28 , "& 


. 419 , 557 , 516 , 563 , 208 , 369 , 266 , 456, "& 
. 25 , 162 , 554, 80 , 129 , 86 , 256 , 431, "& 


" 314 , 69 4 221 ,. 416 , 122 , 513, 472, 191 , *6 
" 336 , 566 , 362, 188 , 269 , 551 , 465 , 126, "& 
" 40 , 239 , 384 , 410 , 236, 317, 119,11 , "& 
“ 469 , 174 , 287 , 475 , 284 , 548 , 462 , 123, "& 
" 37. , 222 , 303 , 43 , 113, 300, 116,75 , "& 
" 466 , 171 , 270 , 351 , 177 , 203 , 348 , 459, "& 
“ 508 , 34 , 425, 318 , 399 , 225) ‘ 


attribute PORT_GROUPING of Alpha_21264a : entity is-- (Ref B.8.8. See Note 4. 
"Differential_Voltage ({ (CLKIN_H), (CLKIN_L) )" 


attribute TAP_SCAN_CLOCK of Tck_H : signal is (5.0e6, LOW); 
attribute TAP_SCAN_IN of Tdi_H : signal is TRUE; 
attribute TAP_SCAN_OUT of Tdo_H : signal is TRUE; 
attribute TAP_SCAN_MODE of Tms_H : signal is TRUE; 
attribute TAP_SCAN_RESET of Trst_L : signal is TRUE; 


attribute COMPLIANCE_PATTERNS of Alpha_21264a : entity is -- (Ref B.8.10). See Note 4. 
"(DcOk_H), (1)" 


attribute INSTRUCTION_LENGTH of Alpha_21264a : entity is 5 ; 
attribute INSTRUCTION_OPCODE of Alpha_21264a : entity is 


“EXTEST (11011),"&-- No longer mandated to be (00000)! 

“SAMPLE (11000),"&-- JWB changed "“PRELOAD" to "SAMPLE" 

“CLAMP (11010),"& 

"HIGHZ {11001),"& 

"DIE_ID (11110),*& 

"BYPASS (11111)"; 
attribute INSTRUCTION_CAPTURE of Alpha_21264a : entity is "00001" 4 
attribute INSTRUCTION_PRIVATE of Alpha_21264a : entity is "Private"; -- See Note 4. 
attribute REGISTER_ACCESS of Alpha_21264a : entity is-- (ref B.8.13) 


“BOUNDARY (EXTEST, SAMPLE)," &-- Redundant. Added for completeness 
“BYPASS (BYPASS, HIGHZ, CLAMP)," &-- ditto 
"DIE_ID([32] (DIE_ID) "; 


attribute BOUNDARY_LENGTH of Alpha_21264a : entity is 367 ; 


attribute BOUNDARY_REGISTER of Alpha_21264a : entity is 


-- sean cell safe entrl disable disable 
-- cell type port function | cell value state 
~--= |--2-+|----+ |------------------- |-------- |---|----|------ |-------------- 
" 366 ( BC_2, TestStat_H, OUTPUT2, x ), "& -- 
" 365 ( BC_2, SromOE_L, OUTPUT2, X ), "& == 
" 364 ( BC_2, SromClk_H, OUTPUT2, x ), "& -- 
" 363 ( BC_2, SromData_H, INPUT, x leg Ske <= 
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" 362 ( BC_3, reset_L, INPUT, x ), "& -- 
" 361 ( BC_3, IRO_H(5), INPUT, x ), “& -- 
" 360 ( BC_3, IRQ_H(4), INPUT, x ), "& -- 
" 359 ( BC_3, IRQ_H(3), INPUT, x ), “& -- 
" 358 ( BC_3, IRQ_H(2), INPUT, x ), "& -- 
" 357 ( BC_3, IRQ_H(1), INPUT, x ), “& -- 
“ 356 ( BC_3, IRQ_H(0), INPUT, x ), "& -- 
" 355 ( BC_3, ClkFwdRst_H, INPUT, x ), "& -- 
" 354 ( BC_2, BeCheck_H(3), BIDIR, x, 339, O, Zz ), “& -- 
" 353 ( BC_2, BcCheck_H(11), BIDIR, x, 339, 0, Zz ). "& -- 
" 352 ( BC_2, SysCheck_L(3), BIDIR, x, 336, 0, WEAK ), “& -- 
“ 351 ( BC_2, BcData_H(31), BIDIR, x, 339, O, Zz ), “& -- 
" 350 ( BC_2, BeData_H(95), BIDIR, x, 339, 0, zZ je & == 
“ 349 ( BC_2, SysData_L(31), BIDIR, x, 336, 0, WEAK1 ), “& -- 
“ 348 ( BC_2, BcData_H(30), BIDIR, x, 339, O, Zz ), "& -- 
" 347 ( BC_2, BcData_H(94), BIDIR, x, 339, O, Zz ), "“& -- 
" 346 ( BC_2, SysData_L(30), BIDIR, x, 336, 0, WEAK] ), "“& -- 
" 345 ( BC_2, BceData_H(29), BIDIR, x, 339, O, Zz ), *& -- 
“ 344 ( BC_2, BcData_H(93), BIDIR, x, 339, 0, Zz ), "“& -- 
“ 343 ( BC_2, SysData_L(29), BIDIR, x, 336, 0, WEAK] ), "& -- 
" 342 ( BC_2, BcData_H(28), BIDIR, x, 339, O, Zz ), "“& -- 
“ 341 ( BC_2, BcData_H(92), BIDIR, x, 339, O, Zz ), “& -- 
* 340 ( BC_2, SysData_L(28), BIDIR, x, 336, O, WEAK] ), "& -- 
" 339 ( BC_3, *, CONTROL, 0 ), “"& -- becellod 
" 338 ( BC_3, BeDataInClk_H(3), INPUT, x ), “& -- 
“ 337 ( BC_2, SysDataOutClk_L(3), OUTPUT2, x ), “& -- 
" 336 ( BC_3, *, CONTROL, 0 ), “& -- secelld 
* 335 ( BC_3, SysDataInClk_H(3), INPUT, x ), “& -- 
* 334 ( BC_2, BcData_H(27), BIDIR, x, 339, 0, z ), “& -- 
" 333 ( BC_2, BcData_H(91), BIDIR, x, 339, 0, zZ ), “& -- 
“ 332 ( BC_2, SysData_L(27), BIDIR, x, 336, OQ, WEAKI ), “& -- 
“ 331 ( BC_2, BcData_H(26), BIDIR, x, 339, 0, Zz ), "& -~ 
" 330 ( BC_2, BecData_H(90), BIDIR, x, 339, 0, Zz ), “& -- 
" 329 ( BC_2, SysData_L(26), BIDIR, x, 336, 0, WEAKI }), "& -- 
" 328 ( BC_2, BcData_H(25), BIDIR, x, 339, O, Zz ), “& -- 
" 327 ( BC_2, BcData_H(89), BIDIR, x, 339, 0, Zz ), “& -- 
" 326 ( BC_2, SysData_L(25), BIDIR, x, 336, 0, WEAK] }), “& -- 
" 325 ( BC_2, BcData_H(24), BIDIR, x, 339, 0, Z ), “& -- 
" 324 ( BC_2, BcData_H(88), BIDIR, x, 339, O, Zz ), “& -- 
" 323 ( BC_2, SysData_L(24), BIDIR, x, 336, 9O, WEAK1 ), “& -- 
“ 322 ( BC_2, BceDataOutClk_L(1), OUTPUT2, x ), "“& -- 
"321 ( BC_2, BcDataOutClk_H(1), OUTPUT2, x ), “& -- 
" 320 ( BC_2, BcCheck_H(2), BIDIR, x, 305, 0, z ), “& -- 
"319 ( BC_2, BcCheck_H(10), BIDIR, x, 305, 0, Z y, “& -- 
" 318 ( BC_2, SysCheck_L(2), BIDIR, x, 302, 0, WEAK] ), “& -- 
* 317 ( BC_2, BcData_H(23), BIDIR, x, 305, O, Zz ), “& -- 
“ 316 ({( BC_2, BcData_H(87), BIDIR, x, 305, 0, 2 ), “& -- 
"315 ( BC_2, SysData_L(23), BIDIR, x, 302, 0, WEAK1 ), "& -- 
“ 314 ( BC_2, BcData_H(22), BIDIR, x, 305, 0, Z ), “& -- 
" 313 ( BC_2, BcData_H(86), BIDIR, x, 305, O, Zz ), “& -- 
" 312 ( BC_2, SysData_L(22), BIDIR, x, 302, OQ, WEAK1 ), “& -- 
" 311 ( BC_2, BcData_H(21), BIDIR, x, 305, O, Zz ), “& -- 
" 310 ( BC_2, BcData_H(85), BIDIR, x, 305, 0, Zz ), "“& -- 
“ 309 { BC_2, SysData_L(21), BIDIR, x, 302, O, WEAK1 ), "“& -- 
" 308 ( BC_2, BcData_H(20), BIDIR, x, 305, 0, z ), “& -- 
“ 307 ( BC_2, BcData_H(84), BIDIR, x, 305, 0, Zz ), “& -- 
“ 306 ( BC_2, SysData_L(20), BIDIR, x, 302, O, WEAK1 ), "& -- 
" 305 ( BC_3, *, CONTROL, 8) ), “& -- becelli 
“ 304 ( BC_3, BcDataInClk_H(2), INPUT, x ), "“& -- 
" 303 ( BC_2, SysDataOutClk_L(2), OUTPUT2, x Jigs “Bore 
“ 302 ( BC_3, *, CONTROL, 0 ), “& -- secelll 
“ 301 ( BC_3, SysDataInClk_H(2), INPUT, x ), "“& -- 
“ 300 ( BC_2, BcData_H(19), BIDIR, x, 305, 0, Zz ), “& -- 
“ 299 ( BC_2, BcData_H(83), BIDIR, x, 305, 0, a ), "& -- 
" 298 ( BC_2, SysData_L(19), BIDIR, x, 302, O, WEAK1 ), "& -- 
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BC_2, BcData_H(18), BIDIR, x, 305>° 


" 297 | 0, Z ), "“& -- 
" 296 ( BC_2, BcData_H(82), BIDIR, x, 305, 0, Z ), "& -- 
" 295 ( BC_2, SysData_L(18), BIDIR, x, 302, O, WEAK1 ), "& -- 
" 294 ( BC_2, BcData_H(i7), BIDIR, x, 305, 0, Z ), “& -- 
“ 293 ( BC_2, BcData_H(81), BIDIR, x, 305, 0, Z ), "& -- 
" 292 ( BC_2, SysData_L(17), BIDIR, x, 302, O, WEAK1 ), "& -- 
" 291 ( BC_2, BcData_H(16), BIDIR, x, 305, O, Z ), "& -- 
» 290 ( BC_2, BcData_H({80), BIDIR, x, 305, 0, Z ), "& -- 
" 289 ( BC_2, SysData_L(16), BIDIR, x, 302, O, WEAK1 ), "& -- 
* 288 ( BC_2, BcCheck_H(1), BIDIR, x, 273, O, Z ), "& -- 
" 287 ( BC_2, BcCheck_H(9), BIDIR, x; 273... 05 Z ), "& == 
" 286 ( BC_2, SysCheck_L(1), BIDIR, x, 270, O, WEAK1 ), "& -- 
" 285 ( BC_2, BcData_H(15), BIDIR, x, 273, O, Z ), "& -- 
" 284 ( BC_2, BcData_H(79), BIDIR, x, 273, 0, Zz ), "& -- 
" 283 ( BC_2, SysData_L(15), BIDIR, x, 270, 0, WEAK1 ), "& -- 
* 282 ( BC_2, BcData_H(14), BIDIR, x, 273, O, Z ), "“& -- 
“ 281 ( BC_2, BcData_H(78), BIDIR, x, 273, 0, Z ), "& -- 
" 280 ( BC_2, SysData_L(14), BIDIR, x, 270, 0, WEAK1 ), "& -- 
" 279 ( BC_2,., BcData_H(13), BIDIR, x, 273, 0, Z ), "& -- 
" 278 { BC_2, BcData_H(77), BIDIR, x, 273, 0, Z ), “& -- 
" 277 ( BC_2, SysData_L(13), BIDIR, x, 270, 0, WEAK1 ), "& -- 
*" 276 ( BC_2, BcData_H(12), BIDIR, x, 273, O, Z ), "“& -- 
" 275 ( BC_2, BcData_H(76), BIDIR, x, 273, O, Z >, "& -- 
" 274 ( BC_2, SysData_L(12), BIDIR, x, 270, 0, WEAK1 ), "& -- 
" 273 ( BC_3, *, CONTROL, 0 ), "& -- becell2 
" 272 ( BC_3, BeDataInClk_H(1), INPUT, x ), "“& -- 
“ 271 ( BC_2, SysDataOutClk_L(1), OUTPUT2, x ), "& -- 
* 270 { BC_3, *, CONTROL, 0 ), “& -- sccell2 
* 269 ( BC_3, SysDataInClk_H(1), INPUT, x ), "& -- 
" 268 ( BC_2, BcData_H(1l), BIDIR, x, 273, O, Z ), "& -- 
" 267 ( BC_2, BcData_H(75), BIDIR, x, 273, O, Z ), "& -- 
" 266 { BC_2, SysData_L(11), BIDIR, x, 270, O, WEAK1 ), "& -- 
" 265 { BC_2, BcData_H(10), BIDIR, x, 273, 0, Z ), "& -- 
" 264 ( BC_2, BcData_H(74), BIDIR, x, 273, O, Zz ), “& -- 
" 263 ( BC_2, SysData_L(10), BIDIR, x, 270, O, WEAK1 ), "& -- 
" 262 ( BC_2, BcData_H(9) , BIDIR, x, 273, OQ, vA ), "& -- 
" 261 ( BC_2, BcData_H(73), BIDIR, x, 273, 90, Z ), "& <= 
“ 260 ( BC_2, SysData_L(9), BIDIR, x, 270, O, WEAK1 ), “& -- 
" 259 ( BC_2, BeData_H(8) , BIDIR, x, 273, O, Z ), "& -- 
" 258 ( BC_2, BeData_H(72), BIDIR, x, 273, 0, Z ), “& -- 
" 257 ( BC_2, SysData_L(8), BIDIR, x, 270, O, WEAK1 ), "& -- 
" 256 ( BC_2, BeDataOutClk_L(0), OUTPUT2, x ), "& -- 
" 255 ( BC_2, BcDataOutClk_H(0), OUTPUT2, xX ), “& -- 
" 254 ( BC_2, BeCheck_H(0), BIDIR, x, 239, 0, Z ), “& -- 
" 253 ( BC_2, BcCheck_H(8), BIDIR, x, 239, O, Z ), "& -- 
“ 252 ( BC_2, SysCheck_L(0), BIDIR, x, 236, 0, WEAKI ), "& -- 
" 251 ( BC_2, BcData_H(7) , BIDIR, x, 239, O, vA ), "“& -- 
" 250 ( BC_2, BcData_H(71), BIDIR, x, 239, O, vA ), "& -- 
" 249 ( BC_2, SysData_L(7), BIDIR, x, 236, 0, WEAK1 ), "& -- 
" 248 ( BC_2, BcData_H(6) , BIDIR, x, 239, 0, Z »), “& -- 
" 247 ( BC_2, BcData_H(70), BIDIR, x, 239, 0, Zz ), "& -- 
" 246 ( BC_2, SysData_L(6), BIDIR, x, 236, O, WEAK1 ), "& -- 
" 245 ( BC_2, BcData_H(5) , BIDIR, x, 239, O, Z »), “& -- 
" 244 ( BC_2, BceData_H(69}, BIDIR, x, 239, 0, Zz ), “& -- 
* 243 ( BC_2, SysData_L(5), BIDIR, x, 236, 0, WEAK1i ), "& -- 
“ 242 ( BC_2, BecData_H(4) , BIDIR, x, 239, O, Zz ), "“& -- 
" 241 ( BC_2, BcData_H(68), BIDIR, x, 239, O, Z ), "& -- 
" 240 ( BC_2, SysData_L(4), BIDIR, x, 236, O, WEAK1 ), "& -- 
" 239 ( BC_3, *, CONTROL, 0 ), “& -- becell3 
* 238 ( BC_3, BeDataInClk_H(0), INPUT, x ), "& -- 
" 237 { BC_2, SysDataOutClk_L(0), OUTPUT2, x ), "& -- 
" 236 ( BC_3, *, CONTROL, 0 ), "& -- scecell3 
" 235 ( BC_3, SysDataInClk_H(0), INPUT, x ), "& -- 
* 234 ( BC_2, BcData_H(3) , BIDIR, x, 239, 0, Zz ), “*& => 
" 233 ( BC_2, BcData_H(67), BIDIR, x, 239, 0, Zz ), "& -- 
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"232 ( BC_2, SysData_L(3), 
" 231 ( BC_2, BeData_H(2) , 
" 230 { BC_2, BcData_H(66), 
“ 229 ( BC_2, SysData_L(2), 
“ 228 ( BC_2, BcData_H(1) , 
“ 227 ( BC_2, BcData_H(65), 
“ 226 ( BC_2, SysData_L(1), 
" 225 ( BC_2, BcData_H(0) , 
" 224 ( BC_2, BcData_H(64), 
“ 223 ( BC_2, SysData_L(0), 
" 222 ( BC_2, BcTag_H(20), 
" 221 ( BC_2, BcTag_H(21), 
“ 220 ( BC_2, BceTag_H(22), 
" 219 ( BC_2, BcTag_H(23), 
" 218 ( BC_2, BcTag_H(24), 
* 217 ( BC_2, BcTag_H(25), 
" 216 ( BC_2, BcTag_H(26), 
" 215 ( BC_2, BcTag_H(27), 
"214 ( BC_2, BcTag_H(28), 
“ 213 { BC_2, BcTag_H(29), 
“ 212 ( BC_2, BcTag_H(30), 
“ 211 ( BC_2, BcTag_H(31), 
* 210 ( BC_2, BcTag_H(32), 
“ 209 ( BC_2, BcTag_H(33), 
" 208 ( BC_3, *, 

" 207 ( BC_3, BcTagInClk_H, 
“ 206 ( BC_2, BcTag_H(34), 
“ 205 ( BC_2, BceTag_H(35), 
* 204 ( BC_2, BcTag_H(36), 
“ 203 ( BC_2, BcTag_H(37), 
“ 202 ( BC_2, BcTag_H(38), 
“201 ( BC_2, BcTag_H(39), 
" 200 ( BC_2, BcTag_H(40), 
" 199 ({ BC_2, BcTag_H(41), 
“198 ( BC_2, BcTag_H(42), 
“ 197 ( BC_2, BcTagParity_H, 
" 196 ( BC_2, BcTagShared_H, 
"195 ( BC_2, BcTagDirty_H, 
"194 ( BC_2, BcTagValid_H, 
"193 ( BC_2, BeTagOutClk_L, 
"192 ( BC_2, BcTagOutClk_H, 
“191 ( BC_2, BcTagOE_L, 
"190 ( BC_2, BcTagwr_L, 
"189 ( BC_2, BcDataWr_L, 

“ 188 ( BC_2, BcLoad_L, 
"187 ( BC_2, BcDataOE_L, 
“186 ( BC_2, BcAdd_H(4), 

“ 185 ( BC_2, BcAdd_H(5), 

" 184 ( BC_2, BcAdd_H(6), 

“ 183 ( BC_2, BcAdd_H(7),. 

“ 182 ( BC_2, BcAdd_H(8), 

“ 181 ( BC_2, BcAdd_H(9), 

" 180 ( BC_2, BcAdd_H(10), 
"179 ( BC_2, BcAdd_H(11), 
“ 178 ( BC_2, BcAdd_H(12), 
"177 ( BC_2, BcAdd_H(13), 
"176 ( BC_2, BcAdd_H(14), 
" 175 ( BC_2, BcAdd_H(15), 
*“ 174 ( BC_2, BcAdd_H(16), 
“ 173 { BC_2, BcAdd_H(i7), 
"172 ( BC_2, BcAdd_H(18), 
“171 ( BC_2, BcAdd_H(19), 
"170 ( BC_2, BcAdd_H(20), 
“169 ( BC_2, BcAdd_H(21), 
“168 ( BC_2, BcAdd_H(22), 


BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
CONTROL, 
INPUT, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
BIDIR, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
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OUTPUT2, 
OUTPUT2, 
OUTPUT2, 
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236, O, WEAK1 ), "& -- 
239, O, Z ), "“& -- 
239, O, Z ), "& -- 
236, O, WEAK1 ), "& -- 
239, O, Zz ), “& -- 
239, O, Zz ), "“& -- 
236, O, WEAK1 ), “& -- 
239, O, Z ), "“& -- 
239, O, Zz ), "“& -- 
236, O, WEAK1 ), "& -- 
208, O, Zz ), "“& -- 
208, O, Zz ), "“& -- 
208, O, Zz ), "& -- 
208, O, Zz ), "“& -- 
208, O, Zz ), “& -- 
208, O, Zz ), "& -- 
208, O, Z ), “& -- 
208, O, Zz ), "& -- 
208, O, Z ), "& -- 
208, O, Zz ), "& -- 
208, oO, Z ), “& -- 
208, O, Zz ), “& -- 
208, O, Z ), “& -- 
208, O, Zz ), "& -- 
), "& -- tecelld 
), "& -- 
208, O, Z ), “& -- 
208, 0, Zz ), “& -- 
208, O, Zz ), “"& -- 
208, O, Z ), "& -- 
208, 0, Zz ), "& -- 
208, O, Z »), “& -- 
208, O, Zz »), “& -- 
208, O, Z }, "“& -- 
208, O, Zz ), “& -- 
208, O, Z ), "& -- 
208, O, Z ), “& -- 
208, O, Z ), "& -- 
208, O, Zz ), "& -- 
), “& -- 
), "& -- 
), “& -- 
), “& -- 
), "& -- 
), "“& -- 
), “& -- 
), "& -- 
), “& -- 
), "& -- 
), "& -- 
dy, "& -- 
), "& -- 
), "& -- 
lat ee= 
), *& -- 
»), “& -- 
), "& -- 
), "& -- 
), "& -- 
), "& -- 
), "“& -- 
), “& -- 
), “& -- 
), "“& -- 
}, "& -- 


Compaq Confidential 


21264A Revision 1.1 — Subject To Change 


21264A Boundary-Scan Register 


B-9 


Boundary-Scan Register 


"167 ( BC_2, BcAdd_H(23), OUTPUT2, <x ), "“& -- 
" 166 ( BC_2, SysData_L(32), BIDIR, x, 150, O, WEAK1 ), "& -- 
" 165 ( BC_2, BcData_H(96), BIDIR, x, 153, O, Zz ), "& -- 
“164 ( BC_2, BeData_H(32), BIDIR, x, 153, O, Zz ), “& -- 
" 163 ( BC_2, SysData_L(33), BIDIR, x, 150, O, WEAK1 ), "& -- 
"162 ( BC_2, BcData_H(97), BIDIR, x, 153, 0, Z ), “& -- 
“161 ( BC_2, BcData_H(33), BIDIR, x, 153, O, Z ), *& -- 
"160 ( BC_2, SysData_L(34), BIDIR, x, 150, 0, WEAK1 ), “& -- 
"159 ( BC_2, BcData_H(98), BIDIR, x, 153, 0, Zz ), "& -- 
" 158 ( BC_2, BcData_H(34), BIDIR, x, 153, O, Z ), "& -- 
" 157 ( BC_2, SysData_L(35), BIDIR, x, 150, O, WEAK1 ), "& -- 
" 156 ( BC_2, BcData_H(99), BIDIR, x, 153, O, Zz ), “& -- 
"155 ( BC_2, BcData_H({35), BIDIR, Xe 153; 0, Zz )}, “& -- 
" 154 ( BC_3, SysDataInClk_H(4), INPUT, x ), "“& -- 
" 153 ( BC_3, *, CONTROL, 0 ), "& -- sccell4 
"152 ( BC_2, SysDataOutClk_L(4), OUTPUT2, x ), "“& -- 
“151 ( BC_3, BcDataInClk_H(4), INPUT, x ), “& -- 
"150 ( BC_3, *, CONTROL, 0 ), "& -- becell4 
"149 ( BC_2, SysData_L (36), BIDIR, x, 150, 0, WEAK1 ), "& -- 
“ 148 ({ BC_2, BcData_H(100), BIDIR, x, 153, O, Z ), “& -- 
"147 ( BC_2, BcData_H(36), BIDIR, x, 153, O, Z ), "& -- 
" 146 ( BC_2, SysData_L (37), BIDIR, x, 150, 90, WEAK] ), "& -- 
"145 ( BC_2, BcData_H(101), BIDIR, x, 153, O, Z ), "& -- 
“ 144 ( BC_2, BcData_H(37), BIDIR, x, 153, 0, Z ), “& <= 
" 143 ( BC_2, SysData_L(38), BIDIR, x, 150, QO, WEAK1 ), “& -- 
" 142 ( BC_2, BcData_H(102), BIDIR, x, 153, O, Z }, “& -~ 
" 141 ( BC_2, BcData_H(38), BIDIR, x, 153, O, Z ), “& -- 
“140 ( BC_2, SysData_L(39), BIDIR, x, 150, OQ, WEAK1 ), "& -- 
"139 ( BC_2, BcData_H(103), BIDIR, x, 153, O, Zz )}, "& -- 
"138 ( BC_2, BcData_H(39), BIDIR, x, 153, 9, Z ), “& -- 
" 137 ( BC_2, SysCheck_L(4), BIDIR, x, 150, O, WEAKi ), "& -- 
" 136 ( BC_2, BeCheck_H(12), BIDIR, x, 153, 0, z ), "& -- 
" 135 ( BC_2, BcCheck_H(4), BIDIR, x, 153, 0, Z }eg ree = 
" 134 ( BC_2, BcDataOutClk_H(2), OUTPUT2, x bop Wao ee 
" 133 ( BC_2, BcDataOutClk_L(2), OUTPUT2, x ), "“& -- 
“ 132 ( BC_2, SysData_L(40), BIDIR, x, 119, 0, WEAK] ), "& -- 
" 131 ( BC_2, BceData_H(104), BIDIR, x, 116, O, Zz ), "& -- 
"130 ( BC_2, BcData_H(40), BIDIR, x, 116, 0, Z ), "& -- 
" 129 ( BC_2, SysData_L(41), BIDIR, x, 119, QO, WEAK] ), "& -- 
"128 ( BC_2, BcData_H(105), BIDIR, x, 116, 0, Z ), "& -- 
" 127 ( BC_2, BcData_H(41l), BIDIR, x, 116, O, Z ), "“& -- 
“126 ( BC_2, SysData_L(42), BIDIR, x, 119, O, WEAK] }, "& -- 
"125 ( BC_2, BcData_H(106), BIDIR, x, 116, O, Zz ), "“& -- 
"124 ( BC_2, BcData_H(42), BIDIR, x, 116, O, Zz ), "“& -- 
"123 ( BC_2, SysData_L(43), BIDIR, x, 119, O, WEAK] ), "& -- 
" 122 ( BC_2, BcData_H(107), BIDIR, x, 116, 0, Z ), "& -- 
"421 ( BC_2, BcData_H(43), BIDIR, x, 116, 0, Zz ), "& -- 
“ 120 ( BC_3, SysDataInClk_H(5), INPUT, x ), “& -- 
"119 ( BC_3, *, CONTROL, 0 ), °& -- secellS 
"118 ( BC_2, SysDataOutClk_L(5), OUTPUT2, x Vig bc > 
"117 ( BC_3, BcDataInClk_H(5), INPUT, x ), "& -- 
"116 ( BC_3, *, CONTROL, 0 ), "& -- becellS 
"115 ( BC_2, SysData_L(44), BIDIR, x, 119, O, WEAK1 ), "& -- 
"114 ( BC_2, BeData_H(108), BIDIR, x, 116, O, Zz ), "“& -- 
"113 ( BC_2, BcData_H(44), BIDIR, x, 116, 0, Z ), “& -- 
"112 ( BC_2, SysData_L(45), BIDIR, x, 119, O, WEAK1 ), "& -- 
“411 ( BC_2, BeData_H(109), BIDIR, x, 116, 0, Z ya: ese 
"110 ( BC_2, BcData_H(45), BIDIR, x, 116, 0, Zz ), "& == 
" 109 ( BC_2, SysData_L(46), BIDIR, x, 119, O, WEAK] ), "°& -- 
“108 ( BC_2, BcData_H(110), BIDIR, x, 116, 90, Z ), "& -- 
" 107 ( BC_2, BcData_H(46), BIDIR, x, 116, O, Z la S&S 
" 106 ( BC_2, SysData_L (47), BIDIR, x, 119, 0, WEAKI1 ), "& -- 
" 105 ( BC_2, BcData_H(1i1l1), BIDIR, x, 116, O, Zz ), "“& -- 
"104 ( BC_2, BcData_H(47), BIDIR, x, 116, O, Z ), "& -- 
" 103 ( BC_2, SysCheck_L(5), BIDIR, x, 119, 0, WEAK] ), "& -- 
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"102 ( BC_2, BcCheck_H(13), BIDIR, x, 116, 0, Zz ), "“& -- 
“101 ( BC_2, BcCheck_H(5), BIDIR, x, 116, O, Z ), "& -- 
"100 ( BC_2, SysData_L(48), BIDIR, x, 87, O, WEAK1 ), "& -- 
" 99 ( BC_2, BcData_H(112), BIDIR, x, 84, oO, Zz ), "“& =- 
“ 98 ( BC_2, BcData_H(48), BIDIR, x, 84, O, Z ), “& -- 
“ 97 ( BC_2, SysData_L(49), BIDIR, x, 87, 0, WEAK1 ), "& -- 
" 96 ( BC_2, BcData_H(113), BIDIR, x, 84, 0, Zz ), “& -- 
" 95 ( BC_2, BcData_H(49), BIDIR, x, 84, oO, Zz ), "& -- 
" 94 { BC_2, SysData_L(50), BIDIR, x, 87, 0, WEAK] ), “& -- 
" 93 {( BC_2, BcData_H(114), BIDIR, x, 84, 0, Z ), "& -- 
" 92 ( BC_2, BcData_H(50), BIDIR, x, 84, 0, Zz ), "& = 
“91 ( BC_2, SysData_L(51), BIDIR, x, 87, 0, WEAK1 ), "& -- 
"90 ( BC_2, BcData_H(115), BIDIR, x, 84, oO, Z ), "& <= 
" 89 {( BC_2, BcData_H(51), BIDIR, x, 84, 0, Z ), "& -- 
" 88 ( BC_3, SysDataInClk_H(6), INPUT, x ), "& -- 
" 87 {( BC_3, *, CONTROL, 0 ), “& -- sccell6 
" 86 ( BC_2, SysDataOutClk_L(6), OUTPUT2, x ), “& -- 
“ 85 ( BC_3, BcDataInClk_H(6), INPUT, x ), "“& -- 
" 84 ( BC_3, *, CONTROL, 0 ), "“& -- becellé 
" 83 ( BC_2, SysData_L(52), BIDIR, x, 87, 0, WEAK] ), "& -- 
" 82 ( BC_2, BcData_H(116), BIDIR, x, 84, O, vA ), "“& -- 
" 81 ( BC_2, BcData_H(52), BIDIR, x, 84, O, Zz hy Me == 
“ 80 ( BC_2, SysData_L(53), BIDIR, x, 87, 0, WEAK1 ), “& -- 
" 79 ( BC_2, BcData_H(117), BIDIR, x, 84, O, Zz ), "& -- 
“78 ( BC_2, BcData_H(53), BIDIR, x, 84, 0, Z ), "“& -- 
" 77 ( BC_2, SysData_L(54), BIDIR, x, 87, oO, WEAK1 ), "& -- 
" 76 { BC_2, BcData_H(118), BIDIR, x, 84, oO, Z ), "“& -- 
woods. ( BC_2, BcData_H(54), BIDIR, x, 84, oO, Zz ), "& -- 
" 74 ( BC_2, SysData_L(55), BIDIR, x, 87, oO, WEAK1 ), "& -- 
"73 {( BC_2, BcData_H({119), BIDIR, x, 84, QO, Zz ), "& -- 
a 2 ( BC_2, BcData_H(55), BIDIR, x, 84, oO, Zz ), "& -- 
"71 ={( BC_2, SysCheck_L(6), BIDIR, x, 87, 0, WEAKi ), "& -- 
"70 ( BC_2, BcCheck_H(14), BIDIR, x, 84, O, Z ), “& -- 
" 69 ( BC_2, BcCheck_H(6), BIDIR, x, 84, oO, Z ), "“& -- 
“ 68 ( BC_2, BcDataOutClk_H(3), OUTPUT2, x ), "& -- 
" 67 ( BC_2, BeDataOutClk_L(3), OUTPUT2, x ), "“& == 
" 66 ( BC_2, SysData_L(56), BIDIR, x, 53, oO, WEAK1 ), "& -- 
" 65 ( BC_2, BcData_H(120), BIDIR, x, 50, oO, Z ), "& -- 
"64 ( BC_2, BcData_H(56), BIDIR, x, 50, 0, Z ), "“& -- 
" 63 ( BC_2, SysData_L(57}), BIDIR, x; S33 0, WEAK1 ), “& -- 
" 62 ( BC_2, BcData_H(121), BIDIR, x, 50, 0, Zz ), “& -- 
“ 61 ( BC_2, BcData_H(57), BIDIR, x, 50, 0, Zz ), “& -- 
" 60 ( BC_2, SysData_L(58), BIDIR, x, 53, 0, WEAK1 ), “& -- 
“$9 ( BC_2, BcData_H(122), BIDIR, x, 50, 0, Z ), “& -- 
“ 58 ( BC_2, BcData_H(58), BIDIR, x, 50, 0, Z ), “& -- 
"57 ( BC_2, SysData_L(59), BIDIR, x, 53, oO, WEAK] ), "“& -- 
“56 {( BC_2, BcData_H(123), BIDIR, x, 50, O, Z ), "“& -- 
" SS ( BC_2, BcData_H(59), BIDIR, x, 50, 0, Z ), “& -- 
"54 {( BC_3, SysDataInClk_H(7), INPUT, x ), “& -- 
"353. -( BC{3::-*, CONTROL, 0 ), “& -- seccell7 
" $2 ( BC_2, SysDataOutClk_L(7), OUTPUT2, x ), "& -- 
" 51 ( BC_3, BcDataInClk_H(7), INPUT, x ), "& -- 
" 50 ( BC_3, *, CONTROL, 0 ), “"& -- becell7 
" 49 ( BC_2, SysData_L(60), BIDIR, x, 53, 0, WEAK1 ), "& -- 
“48 ( BC_2, BcData_H(124), BIDIR, x, 50, 0, Z ), “& -- 
" 47 { BC_2, BcData_H(60), BIDIR, x, 50, 0, Zz ), "& -- 
" 46 ( BC_2, SysData_L(61), BIDIR, x, 53, 0, WEAK1 ), "& -- 
" 45 ( BC_2, BcData_H(125), BIDIR, x, 50, O, vA ), "& -- 
" 44 ( BC_2, BcData_H(61), BIDIR, x, 50, 0, Zz ), “& ~- 
" 43 ( BC_2, SysData_L(62), BIDIR, x, 53, Oo, WEAKI1 ), "& -- 
“ 42 ( BC_2, BcData_H(126), BIDIR, x, 50, 0, Zz ), “& -- 
" 41 {( BC_2, BcData_H(62), BIDIR, x, 50, O, Z Jan “Ee = 
" 40 ( BC_2, SysData_L(63), BIDIR, x, 53, 0, WEAK1 ), "& -- 
“ 39 ( BC_2, BcData_H(127), BIDIR, x, 50, O, Z )p 2 = 
" 38 ( BC_2, BcData_H(63), BIDIR, x, 50, 0, Z ), "& -- 
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" 37 ( BC_2, SysCheck_L(7), BIDIR, x, 53, 0, WEAK1 ), "& -- 
" 36 ( BC_2, BceCheck_H(15), BIDIR, x, 50, Oo, Z ), "& -- 
" 35 ( BC_2, BcCheck_H(7), BIDIR, x, 50, 0, Z lus Ee 
" 34 ( BC_2, SysAddOut_L(0), OUTPUT2, x ), “& -- 
" 33 ( BC_2, SysAddOut_L(1), OUTPUT2, x Pat ee 
" 32 ( BC_2, SysAddOut_L(2), OUTPUT2, x ), "& -- 
" 31 ( BC_2, SysAddOut_L(3), OUTPUT2, x ee ee 
" 30 ( BC_2, SysAddOut_L(4), OUTPUT2, x ), "“& -- 
"29 ( BC_2, SysAddOut_L(5), OUTPUT2, x ), "& <= 
"28 ( BC_2, SysAddOut_L(6), OUTPUT2, x ), "& -- 
"27 ( BC_2, SysAddOut_L(7), OUTPUT2, x ), “& -- 
"26 ( BC_2, SysAddOutClk_L, OUTPUT2, x ), "& -- 
" 25 ( BC_2, SysAddOut_L(8), OUTPUT2, x ), "& -- 
"24 ( BC_2, SysAddOut_L(9), OUTPUT2, x ), "& == 
" 23 { BC_2, SysAddOut_L(10), OUTPUT2, x ), "& -- 
" 22 ( BC_2, SysAddOut_L(11), OUTPUT2, x ), "& -- 
"21 ( BC_2, SysAddOut_L(12), OUTPUT2, x ), "& -- 
" 20 { BC_2, SysAddOut_L(13), OUTPUT2, x ), “& -- 
* 19 ( BC_2, SysAddOut_L(14), OUTPUT2, x ), "“& -- 
" 18 ( BC_3, SysAddIn_L(0), INPUT, x ), "“& -- 
“17 ( BC_3, SysAddIn_L(1), INPUT, x ), "& -- 
"16 ( BC_3, SysAddIn_L(2), INPUT, x ), "& -- 
"15 { BC_3, SysAddIn_L(3), INPUT, x ), "& -- 
" 14 ( BC_3, SysAddIn_L(4), INPUT, x ), "& -- 
" 13 ( BC_3, SysAddIn_L(5), INPUT, x ), "& -- 
"12 =( BC_3, Sys&ddIn_L(6), INPUT, x ), "“& -- 
tae it 8 ( BC_3, SysAddIn_L(7), INPUT, x ), "& -- 
"10 ( BC_3, SysAddIn_L(8), INPUT, x )., "& -- 
" 9 ( BC_3, SysAddInclk_L, INPUT, x ), "& -- 
" 8 ( BC_3, SysAddIn_L(9), INPUT, x ), "“& -- 
oc] ( BC_3, SysAddIn_L(10), INPUT, x ), "& -- 
" 6 ( BC_3, SysAddIn_L(1i1), INPUT, x ), "& -- 
“5 ( BC_3, SysAddIn_L(12), INPUT, x ), "“& -- 
"4 { BC_3, SysAddIn_L(13), INPUT, x ), "& -- 
"3 ( BC_3, SysAddIn_L(14), INPUT, x ), "& -- 
m2 ( BC_3, SysFillvValid_L, INPUT, x ), "& -- 
aa ( BC_3, SysDataInValid_L, INPUT, x ), “& -~ 
" 0 { BC_3, SysDataOutValid_L, INPUT, x ) "3 


attribute DESIGN_WARNING of Alpha_21264a: entity is 


"lL. IEEE 1149.1 circuits on Alpha 21264a are designed primarily to support "& 
" testing in off-line module manufacturing environment. The SAMPLE/PRELOAD"& 
* instruction support is designed primarily for supporting interconnection"& 
. verification test and not for at-speed samples of pin data. "& 
"2. TDO is Open-Drain signal. "& 
“3. Add comment on port pin electrical characteristics: "& 


"42. Comment out if compiler does not support this statement. ; 
end Alpha_21264a; 
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Serial lcache Load Predecode Values 


See the Alpha Motherboards Software Developer’s Kit (SDK) for information. 
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PALcode Restrictions and Guidelines 


D.1 Restriction 1 : Reset Sequence Required by Retire Logic and 


Mapper 


For convenience of implementation, the Ibox retire logic done status bits are not initial- 
ized during reset. Instead, as shown in the example below, the first batch of valid 
instructions sweeps through inum-space and initializes these bits. The 80 status bits 
(one for each inflight instruction) must be marked not done by the first 80 instructions 
mapped after reset, and later marked done when those instructions are retired. There- 
fore, the first 20 fetch blocks must contain four valid instructions each, and must not 
contain any retire logic NOP instructions. 


reset: 


kx 
xx 
*“* 
xx 


bd 


/* 


** 


*/ 


(1) Initialize 80 retirator “done” status bits and 
the integer and floating mapper destinations. 

(2) Do A MTPR ITB_IA, which turns on the mapper source 
enables. 

(3) Create a map stall to complete the ITB_IA. 


State after execution of this code: 
retirator initialized 
destinations mapped 
source mapping enabled 
itb flushed 


The PALcode need not assume the following since the SROM is not 
required to do these: 


atb flushed 
atb_asn0 0 
dtb_asnl 0 
dtb_alt_mode 0 


Initialize retirator and destination map, doing 80 retires. 


addgq r31,r31,r0 /* initialize Int. Reg. 0*/ 
addq r31,r31,r1 /* initialize Int. Reg. 1*/ 
addt £31,£31,f0 /* initialize F.P. Reg. 0*/ 
malt £31,£31, £1 /* initialize F.P. Reg. 1*/ 
addq r31,r31,r2 /* initialize Int. Reg. 2*/ 
addq r31,r31,r3 /* initialize Int. Reg. 3*/ 
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addt £31,f31,f2 /* initialize F.P. Reg. 2*/ 
mult £31,£31, £3 /* initialize F.P. Reg. 3*/ 
addq r31,r31,r4 /* initialize Int. Reg. 4*/ 
addq r31,r31,r5 /* initialize Int. Reg. 5*/ 
addt £31,£31,f4 /* initialize F.P. Reg. 4*/ 
mult £31,£31,£5 /* initialize F.P. Reg. 5*/ 
addq r4r31,r31,r6 /* initialize Int. Reg. 6*/ 
addq £31,r31,r7 /* initialize Int. Reg. 7*/ 
addt £31,f31,f6 /* initialize F.P. Reg. 6*/ 
mult £31,£31,f7 /* initialize F.P. Reg. 7*/ 
addq xr31,r31,xr8 /* initialize Int. Reg. 8*/ 
addq r31,r31,r9 /* initialize Int. Reg. 9*/ 
addt £31,£31,f£8 /* initialize F.P. Reg. 8*/ 
mult £31,£31,£9 /* initialize F.P. Reg. 9*/ 
addq r31,r31,r10 /* initialize Int. Reg. 10*/ 
addq r31,r31,ril /* initialize Int. Reg. 11*/ 
addt £31,£31,£10 /* initialize F.P. Reg. 10*/ 
mult £31,f£31,f11 /* initialize F.P. Reg. 11*/ 
addqg r31,r31,r12 /* initialize Int. Reg. 12*/ 
addq r31,r31,r13 /* initialize Int. Reg. 13*/ 
addt £31,£31,£12 /* initialize F.P. Reg. 12*/ 
mult £31,£31,£13 /* initialize F.P. Reg. 13*/ 
addq r31,r31,r14 /* initialize Int. Reg. 14*/ 
addq r31,r31,r15 /* initialize Int. Reg. 15*/ 
addt £31,£31,£14 /* initialize F.P. Reg. 14*/ 
malt £31,£31,f£15 /* initialize F.P. Reg. 15*/ 
addq r31,r31,r16 /* initialize Int. Reg. 16*/ 
addq r31,r31,r17 /* initialize Int. Reg. 17*/ 
addt £31,£31,£16 /* initialize F.P. Reg. 16*/ 
mult £31,£31,£17 /* initialize F.P. Reg. 17*/ 
addq r31,r31,r18 /* initialize Int. Reg. 18*/ 
addq r31,r31,r19 /* initialize Int. Reg. 19*/ 
addt £31,£31,f18 /* initialize F.P. Reg. 18*/ 
malt £31,£31,£19 /* initialize F.P. Reg. 19*/ 
addq r31,r31,r20 /* initialize Int. Reg. 20*/ 
addg r31,r31,r21 /* initialize Int. Reg. 21*/ 
addt £31,£31,£20 /* initialize F.P. Reg. 20*/ 
mult £31,£31,f21 /* initialize F.P. Reg. 21*/ 
addq r31,r31,r22 /* initialize Int. Reg. 22*/ 
addq r31,r31,r23 /* initialize Int. Reg. 23*/ 
addt £31,£31,£22 /* initialize F.P. Reg. 22*/ 
mult £31,£31,f23 /* initialize F.P. Reg. 23*/ 
addq r31,r31,r24 /* initialize Int. Reg. 24*/ 
addq r31,r31,r25 /* initialize Int. Reg. 25*/ 
addt £31,£31,f24 /* initialize F.P. Reg. 24*/ 
mult £31,£31,f£25 /* initialize F.P. Reg. 25*/ 
addq r31,r31,r26 /* initialize Int. Reg. 26*/ 
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addq xr31,r31,r27 /* initialize Int. Reg. 27*/ 
addt £31,f31, £26 /* initialize F.P. Reg. 26*/ 
malt £31,£31,f£27 /* initialize F.P. Reg. 27*/ 
addq xr31,r31,r28 /* initialize Int. Reg. 28*/ 
addq r31,r31,r29 /* initialize Int. Reg. 29*/ 
addt £31,£31,f£28 /* initialize F.P. Reg. 28*/ 
mult £31,£31, £29 /* initialize F.P. Reg. 29*/ 
addq xr31,r31,r30 /* initialize Int. Reg. 30*/ 
addt £31,f31, £30 /* initialize F.P. Reg. 30*/ 
addq r31,r31,r0 /* initialize retirator 63*/ 
addq x31,r31,r0 /* initialize retirator 64*/ 
addq r31,r31,r0 /* initialize retirator 65*/ 
addq r31,r31,r0 /* initialize retirator 66*/ 
addq 4r31,r31,r0 /* initialize retirator 67*/ 
addg r31,r31,r0 /* initialize retirator 68*/ 
addq x31,r31,r0 /* initialize retirator 69*/ 
addq xr31,r31,r0 /* initialize retirator 70*/ 
addq r31,r31,r0 /* initialize retirator 71*/ 
addg x31,r31,r0 /* initialize retirator 72*/ 
addq r31,r31,r0 /* initialize retirator 73*/ 
addg r31,r31,r0 /* initialize retirator 74*/ 
addq xr31,r31,r0 /* initialize retirator 75*/ 
addq x31,r31,r0 /* initialize retirator 76*/ 
addq r31,r31,r0 /* initialize retirator 77*/ 
addg xr31,r31,r0 /* initialize retirator 78*/ 
addq 1r31,r31,r0 /* initialize retirator 79*/ 
addg x31,r31,r0 /* initialize retirator 80*/ 
/* stop deleting*/ 
mtpr r31,EV6__ITB IA /* flush the ITB (SCRBRD=4) *** this also 
turns on mapper source enables ****/ 

mtpr r31,EV6__DTB_IA /* flush the DIB (SCRBRD=7)*/ 
mtpr r31,EV6__VA_CTL /* clear VA_CTL (SCRBRD=5) */ 
mtpr r31,EV6__M CTL /* clear M.CTL (SCRBRD=6) */ 


/* 

** Create a stall outside the IQ until the mtpr EV6__ITB_IA retires. 

** We can use DIB_ASNx even though we don’t seem to follow the restriction on 
** scoreboard bits (4-7).It's okay because there are no real dstream 

** operations happening. 


/ 
mtpr r31,EV6__DTB_ASNO /* clear DITB_ASNO (SCRBRD=4) creates a map- 
stall under the above mtpr to SCRBRD=4*/ 
mtpr r31,EV6__DTB ASN1 /* clear DTB_ASN1 (SCRBRD=7) */ 
mtpr r31,EV6__CC_CTL /* clear CC_CTL (SCRBRD=5) */ 
Itpr r31,EV6__DTB ALT MODE/* clear DTB_ALT MODE (SCRBRD=6) */ 
/* 


** MAP SHADOW_REGISTERS 


xk 


** The shadow registers are mapped. This code may be done by the SROM 
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nxto: 


tcho: 


nxtl: 


tchl: 


nxt2: 


tch2: 


nxt3: 


tch3: 


nxt4: 


tch4: 


nxt5: 


tch5: 


mxté: 


tché: 


nxt7: 


addgq 
addq 


addq 
br 


-align 
lda 
mtpr 
br 
br 


r31,r31,r0 
r31,r31,r0 
r31,r31,r0 
x31, tchO 


3 

x0, 0x0086 (r31) 
r0,EV6__I_CTL 
r31, nxtl 
r31, tchl 


mtpr x31,EV6__IER_CM 


addq 
br 
br 


addq 


addq 
br 


addq 
addq 
br 

addq 
addq 


br 


addq 
addgq 
br 


addq 
addq 
br 
br 


addq 


r31,r31,r0 
r31, mxt2 
r31, tch2 


xr31,r31,r0 


r31,r31,r0 
x31, nxt3 
r31, tch3 


r31,r31,r0 
r31,r31,r0 
r31, nxt4 
x31, tch4 


r31,r31,r0 


‘r31,r31,r0 


r31, mxt5 
r31, tch5 


r31,r31,r4 
r31,r31,r5 
x31, mxt6 
x31, tch6 


r31,r31,r6 
r31,r31,xr7 
r31, mxt7 
r31, tch7 


r31,r31,r20 


/* 
/* 
/* 
/* 


/* 
/* 
/* 
/* 


/* 


/* 
/* 
/* 


/* 


/* 
/* 
/* 


/* 
/* 
/* 
/* 


/* 


/* 
/* 
/* 


/* 
/* 
/* 
/* 


/* 
/* 
/* 
/* 


/* 


or the PALcode, but it must be done in the mamner and order below. 


It assumes that the retirator has been initialized, that the 
non-shadow registers are mapped, and that mapper source enables are on. 


Source enables are on. For fault-reset and wake from sleep, we need to 
ensure we are in the icache so we don’t fetch junk that touches the 
shadow sources before we write the destinations. For normal reset, 

we are already in the icache. However, so this macro is useful for 
all cases, force the code into the icache before doing the mapping. 


Assume for fault-reset, and wake from sleep case, the exc_addr is 
stored in rl. 


nop* / 
nop* / 
nop*/ 
fetch in next block*/ 


aie betes SDE=2, IC_EN=3 (SCRBRD=4) */ 
continue executing in next block*/ 
fetch in next block*/ 


clear IER_CM (SCRBRD=4) creates a map-stall 
under the above mtpr to SCRBRD=4*/ 

nop*/ 

continue executing in next block*/ 

fetch in next block*/ 


ist buffer fetch block for above map- 
stall*/ 

nop*/ 

continue executing in next block*/ 

fetch in next block*/ 


2nd buffer fetch block for above map-stall*/ 
nop*/ 

continue executing in next block*/ 

fetch in next block*/ 


need 3rd buffer fetch block to get correct 
SDE bit for next fetch block*/ 

nop*/ 

continue executing in next block*/ 

fetch in next block*/ 


initialize Shadow Reg. 0*/ 
initialize Shadow Reg. 1*/ 
continue executing in next block*/ 
fetch in next block*/ 


initialize Shadow Reg. 2*/ 
initialize Shadow Reg. 3*/ 
continue executing in next block*/ 
fetch in next block*/ 


initialize Shadow Reg. 4*/ 
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addq r31,r31,r21 /* initialize Shadow Reg. 5*/ 
br r31, nxt8 /* continue executing in next block*/ 
tch7: br x31, tch8 /* fetch in next block*/ 
nxt8: addq 1r31,r31,r22 /* initialize Shadow Reg. 6*/ 
addq r31,r31,r23 /* initialize Shadow Reg. 7*/ 
br x31, mxt9 /* continue executing in next block*/ 
tch8: br r31, nxt0 /* go back to Ist block and start executing*/ 
nxt9: 
/* 


/* 
** 
kk 
xx 


wk 


*/ 


INIT_WRITE_MANY 
Write the cbox write many chain, initializing the bcache configuration. 
This code is on a cache block boundary, 


*** the bcache is initialized OFF for the burnin test *** 


Because we aligned on and fit into a icache block, and because sbe=0, 
and because we do an mb at the beginning (which blocks further progress 
until the entire block has been fetched in), we don’t have to 

fool with pulling this code in before executing it. 


#undef bc_enable_a 
#undef init_mode_a 
#undef bc_size_a 

#undef zeroblk_enable_a 
#undef enable_evict_a 
#undef set_dirty_enable_a 
#undef bc_bank_enable_a 
#undef bc_wrt_sts_a 


loadwm: 


#define bc_enable_a 0 
#define init_mode_a 0 
#define bc_size_a 0 
#define zeroblk_enable_a 1 
#define enable_evict_a 0 
#define set_dirty_enable_a 0 
#define bc_bank_enable_a 0 
#define bc_wrt_sts_a 0 
lda rl, WRITE_MANY CHAIN H(r31) 
sll rl, 32, ri /* data<35:32> */ 
LDLI(r1, WRITE_MANY CHAIN L, rl) /* data<31:00> */ 
adadq 1x31,6,r0 /* shift in 6x 6-bits*/ 
mb /* wait for all istream/dstream to camplete*/ 


br x31, becshf 


-align 6 

beeshf:mtpr r1,EV6__DATA /* shift in 6 bits*/ 
subq 1£0,1,r0 /* decrement RO*/ 
beq r0,bcecend /* done if RO is zero*/ 
srl r1,6,r1 /* align next 6 bits*/ 


Compag Confidential 


21264A Revision 1.1 — Subject To Change PALcode Restrictions and Guidelines D-5 


Restriction 1 : Reset Sequence Required by Retire Logic and Mapper 


r31,beceshf /* continue shifting*/ 


br 
becend :mtpr r31,EV6__EXC_ADDR + 16/* dunmy IPR write - sets SCBD bit 4 */ 
addq r1r31,r31,r0 /* nop*/ 
addq r31,r3i,r1 /* nop*/ 
mtpr 14r31,EV6__EXC_ADDR + 16 /* also a dummy IPR write - 
/* stalls until above write 
/* retires*/ 
beq ¥31, beenxt /* predicts fall through in PAlImode*/ 
br r31, .-4 /* fools ibox predictor into infinite loop*/ 
addq x31,r31,rl1 /* nop*/ 
beenxt:addq r31,4,r0 /* load PCTX..... bl 
Tatpr r0,EV6__PROCESS_ CONTEXT | ae FPE=1 (SCRBRD=4) */ 
lda r0,DC_CTL_INIT_K (r31) /* load DCLCTL..... *7 
mtpr xr0,EV6_DC_ CTL TPO BRE ECC_EN=0, FHIT=0, SET_EN=3 
/* (SCRBRD=6) */ 
addq r31,r3i,r0 /* nop*/ 
addq r31,r31,r1 /* nop*/ 
lda x0, Oxf££61 (r31) /* RO = *xff61 (superpage) */ 
zap r0, Oxfc,r0 /* PTE protection for DIB write in next 
block*/ 
mtpr xr31,EV6__DTB TAGO /* write DTB_TAGO (SCRBRD=2,6)*/ 
mtpr r¥31,EV6__DIB_ TAG1 /* write DIB_TAG] (SCRBRD=1,5)*/ 
mtpr xr0,EV6_DTB PTEO /* write DTB_PTEO (SCRBRD=0, 4)*/ 
mtpr 4x0,EV6_DITB_PTE1 /* write DITB_PTE1 (SCRBRD=3,7)*/ 
mtpr 1r31,EV6__SIRR /* clear SIRR (SCRBRD=4) */ 
ida x0, Ox08FF (r31) /* load FPCR..... */ 
sll r0,52,r0 PP abe aru initial FPCR value*/ 
itoft x0, £0 /* nop itoftr0, £0; value = 0x8FF0000000000000*/ 
mt_fpcr £0 /* nop mt_fpcrf0,£0,£0; do the load*/ 
lda r0, 0x2086 (r31) /* load ILCTL..... ioe f 
ldah 4r0,0x0050(r0) ese Core TB MB EN=1, CALL PAL R23=1, SL_XMIT=1, 
/* SBE=0, SDE=2, IC_EN=3*/ 
mtpr xr0,EV6__I CTL /* value = 0x0000000000502086 (SCRBRD=4) */ 
mtpr xr31,EV6_cCC /7* clear CC (SCRBRD=5) */ 
lda r0, 0x001F (r31) /* write-one-to-clear bits in HW_INT_CLR, 
/* I_STAT and DC_STAT*/ 
sll r0,28,r0 /* value = 0x00000001F0000000*/ 
mtpr 1r0,EV6__HW_INT_CLR/* clear bits in HW_INT_CLR (SCRBRD=4) */ 
mtpr xr0,EV6__I_STAT /* clear bits in I_STAT 
/*(SCRBRD=4) creates a map-stall 
/* under the above mtpr to SCRBRD=4*/ 
lda r0, 0x001F (r31) /* value = 0x000000000000001F* / 
mtpr xr0,EV6__DC_STAT /* clear bits in DC_STAT (SCRBRD=6) */ 
addq r31,r31,r0 /* nop*/ 
mtpr x1r31,EV6__PCTR_CTL /* ist buffer fetch block for above map-stall 
/* and ist clear PCTR_CTL (SCRBRD=4) */ 
bis r31,1,x0 /* set up value for demon write*/ 
bis r31,1,r0 /* set up value for demon write*/ 
mulg/v r31,r31,xr0 /* nop*/ 
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mtpr xr31,EV6__PCTR_CTL /* 2nd buffer fetch block for above map-stall 
/* and 2nd clear PCTR_CTL (SCRBRD=4) */ 


bis r31,1,r0 /* set up value for demon write*/ 

bis r31,1,r0 /* set up value for demon write*/ 

milq xr31,r31,r0 /* nop*/ 

lda r0, 0x780 (r31) /* this is new initialization stuff to 

prevent*/ 

mb 

whint r0 /* 1d/st below from going off-chip */ 

mb 

bis r31,1,r0 /* set up value for demon write*/ 

ldq.p r1,0x780(r31) /* flush Pipe 0 LD logic*/ 

ldqg_p r0,0x788(r31) /* flush Pipe 1 LD logic*/ 

mb /* wait for LD’s to complete*/ 

mb /* wait for LD’s to complete*/ 

stq_p rl1,0x780(r31) /* flush Pipe 0 ST logic*/ 

stq._p 4r0,0x788 (r31) /* flush Pipe 1 ST logic*/ 

bis r31, 32, x0 /* load loop count of 32*/ 
jsr_init_loop: 

bsr x31, jsr_init_loop_nxt /* JSR to PC+4*/ 
jsr_init_loop_nxt: 

stq_p 4xr1,0x780(r31) /* flush Pipe 0 ST logic*/ 

subg 1r0,1,r0 /* decrement loop count*/ 

beq r0,jsr_init_done /* done?*/ 

br r31,jsr_init_loop /* continue loop*/ 


jsxr_init_done: 


lda r0, 0x03FF (r31) /* create FP one..... wd 
sll r0,52,r0 EP ces wake value = 0x3FF0000000000000 */ 
itoft r0,f0 /* put it into FO reg */ 
addg r31,r3i,rl1 /* nop (also clears R1) */ 
malt £0,£0,f0 /* flush mul-pipe */ 
addt £0,£0, £0 /* flush add-pipe */ 
divt £0, £0, £0 /* flush div-pipe */ 
sqrtt f£0,f0 /* flush div-pipe */ 
cvtgqt f0,£0 /* flush add-pipe (integer logic) */ 
perr r31,r31,r0 /* flush MVI logic */ 
maxuw4 r31,r31,r0 /* flush MVI logic */ 
pkwb r31,r0 /* flush MVI logic */ 
re x0 /* clear interrupt flag*/ 
addq r31,r31,rl1 /* nop (also clears R1) */ 
add@q r31,r31,r1 /* nop (also clears R1)*/ 
addq r31,r31,r1 /* nop (also clears R1)*/ 
/* 
* This palbase init exists for the rare cases 
* when this code is loaded into upper memory. 
* That is the case when this code is loaded 
* and executed in memory on a system that has 
* already been initialized. This technique 
* can sometimes be used to debug snippets of 
* this code. 
a | 
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br r31,palbase_init 

palbase_init: 
br r0, br60 /* x0 <- current location */ 

br60: lda rl, (EntryPoint-br60) (r0) /* rl <- location of codebase */ 
mtpr rl, EV6__PAL_BASE /* set up pal_base register */ 


bis r31, 2, r0 
mtpr x0, EV6_VA_CTL 


bis pa = a 
mtpr x0, EV6_M CTL 


br r0, jmp0 
jmp0: addq 10, (jmpl-jmp0+1), r0 
hw_rets/jmp (r0) 


jmp1: 

lda rl, 1(x31) /* x1 <- cc_ctl enable bit */ 

sll rl, 32, rl 

mtpr ri, EV6é_CC CIL /* Enable/clear the cycle counter. */ 
/* 


** Now initialize the dcache to allow the 
** minidebugger so save gpr’s 
af 


D.2 Restriction 2: No Multiple Writers to IPRs in Same Scoreboard 
Group 


For convenience of implementation, only one explicit writer (HW_MTPR) to IPRs that 
are in the same group can appear in the same fetch block (octaword-aligned octaword). 
Multiple explicit writers to IPRs that are not in the same scoreboard group can appear. 

If this restriction is violated, the IPR readers might not see the in-order state. Also, the 

IPR might ultimately end up with a bad value. 


D.3 Restriction 4 : No Writers and Readers to IPRs in Same Score- 
board Group 


This restriction is made for the convenience of microprocessor implementation. 

An explicit reader of an IPR in a particular scoreboard group cannot follow an explicit 
writer (HW_MTPR) to an IPR in that same scoreboard group within one fetch block 
(octaword-aligned octaword). Also within one fetch block, an implicit reader of an IPR 
in a particular scoreboard group cannot follow an explicit writer (HW_MTPR) to an 
IPR in that scoreboard group. This restriction covers writes to DTB_PTE or DTB_TAG 
followed by LD, ST, or any memory operation, including all types of JMP instructions 
and HW_RET instructions that do not have the STALL bit set. 
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D.4 Guideline 6 : Avoid Consecutive Read-Modify-Write-Read- 
Modify-Write 


Avoid consecutive read-modify-write-read-modify-write sequences to IPRs in the same 
scoreboard group. 


The latency between the first write and the second read is determined by the retire 
latency of the IPR. For convenience of implementation, the latency between the time 
when the read is issued and when the final write is issued depends on the run-time con- 
tents of the issue queue. It is somewhere between four and nine cycles, even if there is 
no data dependency between the read and write. 


D.5 Restriction 7 : Replay Trap, Interrupt Code Sequence, and STF/ 
ITOF 


On an Mbox replay trap, the 21264A Ibox guarantees that the refetched load or store 
instruction that caused the trap is issued before any newer load or store instructions. For 
load and integer store instructions, this is a consequence of the natural operation of the 
issue queue. The refetched instruction enters the age-prioritized queue ahead of newer 
load and store instructions and does not have any dependencies on dirty registers. 


Because there is no overhead time for checking these register dependencies (that is, it is 
known upon enqueueing that there are no dirty registers), the queue will issue the 
refetched instruction in priority order. For floating-point store instructions, there is nor- 
mally some overhead associated with checking the floating-point source register dirty 
status, so the store instruction would normally wait before being issued. This would 
have the undesired consequence of allowing newer load and store instructions to be 
issued out of order. A deadlock can occur if issuing the instructions out-of-order causes 
the floating-point store instruction to continually replay the trap. To avoid the deadlock 
on a floating-point store instruction replay trap, the source register dirty status is not 
checked (the source register is assumed to be clean because the store instruction was 
issued previously). 


The hardware mechanism that keeps track of replayed floating-point store instructions, 
and cancels the dirty register check, requires some software restrictions to guarantee 
that it is applied appropriately to the replayed instruction and not to other floating-point 
store instructions. The hardware mechanism marks the position in the fetch block (low 
two bits of the PC) where the replay trap occurred. This action cancels the dirty float- 
ing-point source register check of the next valid instruction enqueued to the integer 
queue (integer, all load and store, and ITOF instructions) that has the same position in 
the fetch block (normally the replayed STF). If the PC is somehow diverted to a PAL- 
code flow, this hardware might inadvertently cancel the register check of some other 
STF (or ITOF) instruction. Fortunately, there are a minimal number of reasons why the 
PC might be diverted during a replay trap. They are interrupts and ITB fills. 


The following PALcode example shows that an STF or ITOF instruction, in a given 
position in a fetch block, must be preceded by a valid instruction that is issued out of 
the integer queue in the same position in an earlier fetch block. Acceptable instruction 
classes include load, integer store, and integer operate instructions that do not have R31 
as a destination or branch. 
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Bad_interrupt_flow_entry: 


ADDQ R31,R31,R0 

STF Fa, (Rb) ; This STF might not undergo a dirty source register 
; check and might give wrong results 

ADDQ R31,R31,R0 

ADDQ R31,R31,R0 

Good_interrupt_flow_entry: 

ADDO R31,R31,R0; Enables FP dirty source register 
; Check for (PC{1:0] == 00) 

ADDQ R31,R31,R0; Enables FP dirty source register 
; Check for (PC[1:0] == 01) 

ADDQ R31,R31,R0; Enables FP dirty source register 
; Check for (PC[{1:0] == 10) 

ADDO R31,R31,R0; Enables FP dirty source register 
; check for (PC[{1:0] == 11) 

ADDO R31,R31,R0 

SIF Fa, (Rb); This STF will successfully undergo 

; a dirty source register check 

ADDO R31,R31,R0 

ADDQ R31,R31,R0 


D.6 Restriction 9 : PALmode Istream Address Ranges 


PALmode[physical] Istream addresses must ensure proper sign extension for the 
selected value of [_CTL[VA_48]. When I_CTL[VA_48] is clear, indicating 43-bit vir- 
tual address format, PALmode[physical] Istream addresses must sign-extend address 
bits above bit 42 although the physical address range 1s 44 bits. An illegal address can 
only be generated by a PALmode JSR-type instruction or a HW_RET instruction 
returning to a PALmode address. 


D.7 Restriction 10: Duplicate IPR Mode Bits 


The virtual address size is selectable by programming IPR bits I CTL[VA_48] 

and VA_CTL[VA_48]. These bit values should usually be equal when operating in 
native (virtual) mode. The I_CTL[VA_48] bit determines the DTB double3/double4 
PALcode entry, the JSR mispredict comparison width, the VPC address generation 
width, the Istream ACV limits, and the TVA_FORM format selection. The 
VA_CTL[VA_48] bit determines the VA_FORM format selection and the Dstream 
ACV limits. IPR mode bits I CTL[VA_FORM_32] and VA_CTL[VA_FORM_ 32] 
should be consistent when executing in native mode. 
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D.8 Restriction 11: Ibox IPR Update Synchronization 


When updating any Ibox IPR, a return to native (virtual) mode should use the HW_RET 
instruction with the associated STALL bit set to ensure that the updated IPR value 
affects all instructions following the return path. The new IPR value takes effect only 
after the associated HW_MTPR instruction is retired. 


For update to some IPR fields with propagation delay, such as I CTL[SDE] and 
PCTX[FPE], synchronization as described in Section D.32 is the preferred method of 
synchronization. 


D.9 Restriction 12: MFPR of Implicitly-Written IPRs EXC_ADDR, 
IVA_FORM, and EXC_SUM 


Implicitly written IPRs are non-renamed hardware registers that must be available for 
subsequent traps. After any trap to PALcode, hardware protects the values from a sec- 
ond implicit write by locking these registers and delaying subsequent traps for a safe 
(limited time). Their values can be read reliably by a HW_MFPR within the first four 
instructions of a PALcode flow and prior to any taken branch in that PALcode flow, 
whichever is earlier. These instructions should not include PALmode trapping instruc- 
tions. After the delimiting instruction defined above retires, these registers are unlocked 
and may change due to new exception conditions. 


If a second exception occurs before the registers are unlocked, it will be either delayed 
or forced to replay trap (a non-PALmode trap) until the register has been unlocked. 
After being unlocked, a subsequent new path exception condition will be allowed to 
reload the register and trap to PALcode. The 21264A may complete execution of the 
first PALcode flow, encountering the second exception condition before the delimiting 
instruction is retired, hence the need for the locking mechanism to ensure visibility of 
the initial register value. 


The VA_FORM, VA, and MM_STAT registers are not included in this list of protected 
IPRS. See Section D.24 for a description of how to protect these IPRs from subsequent 
implicit writers. 


D.10 Restriction 13 : DTB Fill Flow Collision 


Two DTB fill flows might collide such that the HW_MTPR’s in the second fill could be 
issued before all of the HW_MTPR’s in the first PALcode flow are retired. This can be 
prevented by putting appropriate software scoreboard barriers in the PALcode flow. 


D.11 Restriction 14: HW_RET 


There can be no HW_RET in the first fetch block of a PALcode routine, other 

than CALL_PAL routines. With a HW_RET in the first fetch block of a PALcode rou- 
tine, the HW_RET will be mispredicted and the JSR/RETURN stack could lose its syn- 
chronization. 
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D.12 Guideline 16 : JSR-BAD VA 


A JSR memory format instruction that generates a bad VA (IACV) trap requires PAL- 
code assistance to determine the correct exception address. If the 
EXC_SUM[BAD_IVA] is set, bits [63,1] of the exception address are valid in the VA 
IPR and not the EXC_ADDR as usual. The PALmode bit, however, is always located in 
EXC_ADDR[0] and must be combined, if necessary, by PALcode to determine the full 
exception address. 


D.13 Restriction 17: MTPR to DTB_TAGO/DTB_PTE0/DTB_TAG1/ 
DTB_PTE1 


These four write operations must be executed atomically, that is, either all four must be 
retired or none of them may be retired. 


D.14 Restriction 18: No FP Operates, FP Conditional Branches, 
FTOI, or STF in Same Fetch Block as HW_MTPR 


For convenience of implementation, no FP operate instructions, FP conditional 
branches, FTOI register move instructions, or FP store instructions are allowed in the 
same fetch block as any HW_MTPR instructions. This includes ADDx/MULx/DIVx/ 
SQRTx/FPConditionalBranch/STx/FTOIx, where x is any applicable FP data type, but 
does not include LDx/ITOFx. 


D.15 Restriction 19: HW_RET/STALL After Updating the FPCR by 
way of MT_FPCR in PALmode 


FPCR updating occurs in hardware based on the retirement of a nontrapping version of 
MT_FPCR (in PALcode). Use a HW_RET/STALL after the nontrapping MT_FPCR to 
achieve minimum latency (four cycles) between the retiring of the MT_FPCR and the 
first FLOP that uses the updated FPCR. 


D.16 Guideline 20:1 _CTL[SBE] Stream Buffer Enable 


The I_CTL[SBE] bits should not be enabled when running with the Icache disabled to 
avoid potentially long fill delays. When the Icache is disabled, the only method of sup- 
plying instructions is by way of a stream hit. If the fill is returned in non-sequential 
wrap order, the stream will continue fetching through the entire page while waiting for 
a hit. Normally the data will be found in the cache. 


D.17 Restriction 21: HW_RET/STALL After HW_MTPR ASNO/ASN1 


There must be a scoreboard bit-to-register dependency chain to prevent HW_MTPR 
ASNO or HW_MTPR ASNI1 from being issued while any of scoreboard bits [7:4] are 
set. The following example contains a code sequence that creates the dependency chain. 
:Assume Ra holds value to write to ASNO/ASN1 

HW_MFPR RO, VA, SCBD<7,6,5,4> 

XOR RO, RO, RO 

BIS RO, R9, RY 
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Restriction 22: HW_RET/STALL After HW_MTPR !SO/IS1 


BIS R31, R31, R31 

HW_MTPR R9, ASNO, SCBD<4> 

HW_MTPR R9, ASN1, SCBD<7> 

This sequence guarantees, through the register dependency on RO, that neither 
HW_MTPR are issued before scoreboard bits [7:4] are cleared. In addition, there must 
be a HW_RET/STALL after a HW_MTPR ASNO/HW_MTPR ASNI pair. Finally, 


these two writes must be executed atomically, that is, either both must be retired or nei- 
ther may be retired. 


D.18 Restriction 22: HW_RET/STALL After HW_MTPR IS0/IS1 


There must be a scoreboard bit-to-register dependency chain to prevent either 
HW_MTPR ISO or HW_MTPR IS1 from issuing instructions while any of scoreboard 
bits [7:4] are set. The following example contains a code sequence that creates the 
dependency chain. 

HW_MFPR RO, VA, SCBD<7,6,5,4>,RO 

XOR RO, RO, RO 

BIS RO, R9, R9 

BIS R31 ,R31, R31 

HW_MTPR R9, ISO, SCBD<6> 

HW_MTPR R9, IS1, SCBD<7> 

This sequence guarantees, through the register dependency on RO, that neither 
HW_MTPR are issued before scoreboard bits [7:4] are cleared. There must be a 
HW_RET/STALL after an HW_MTPR ISO/HW_MTPR IS1 pair. Also, these two 


writes must be executed atomically, that is, either both must be retired or neither may be 
retired. 


D.19 Restriction 23: HW_ST/P/CONDITIONAL Does Not Clear the 
Lock Flag 


A HW_ST/P/CONDITIONAL will not clear the lock flag such that a successive store- 
conditional (either STx_C or HW_ST/C) might succeed even in the absence of a load- 
locked instruction. In the 21264A, a store-conditional is forced to fail if there is an 
intervening memory operation between the store-conditional and its address-matching 
LDxL. The following example shows the memory operations. 

LDL/Q/F/G/S/T 

STL/Q/F/G/S/T 

LDQ U (not to R31) 

STQ_U 


Absent from this list are HW_LD (any type), HW_ST (any type), ECB, and WH64. 
Their absence implies that they will not force a subsequent store-conditional instruction 
to fail. PALcode must insert a memory operation from the above list after a HW_ST/ 
CONDITIONAL in order to force a future store-conditional to fail if it was not pre- 
ceded by a load-locked operation: 


HW_LDxL 


Compag Confidential 
21264A Revision 1.1 —- Subject To Change PALcode Restrictions and Guidelines D-13 


Restriction 24: HW_RET/STALL After HW_MTPR IC_FLUSH, IC_FLUSH_ASM, 


xxx 
HW_ST/C -> RO 

Bxx RO, try_again 

STQ ; Force next ST/C to fail if no preceding LDxL 
HW_RET 


D.20 Restriction 24: HW_RET/STALL After HW_MTPR IC_FLUSH, 
IC_FLUSH_ASM, CLEAR_MAP 


There must be a HW_RET/STALL after a HW_MTPR IC_FLUSH, IC_FLUSH_ASM, or 
CLEAR_MAP. The Icache flush associated with these instructions will not occur until 
the HW_RET/STALL occurs and all outstanding Istream fetches have been completed. 


Also, there must be a guarantee that the HW_MTPR IC_FLUSH or HW_MTPR 
IC_FLUSH_ASM will not be retired simultaneously with the HW_RET/STALL. This 
can be ensured by inserting a conditional branch between the two (BNE R31, 0 cannot 
be mispredicted in PALmode), or by ensuring at least 10 instructions between the 
MTPR instruction and the HW_RET/STALL containing at least one instruction in each 
quad aligned group with a valid destination. Finally, the HW_RET/STALL that is used 
for CLEAR_MAP cannot trigger a cache flush. That is, if both a CLEAR_MAP and 
IC_FLUSH are desired, there must be two HW_RET/STALLs, one following each 
HW_MTPR. 


D.21 Restriction 25: HW_MTPR ITB_IA After Reset 


An HW_MTPR ITB_IA is required in the reset PALcode to initialize the ITB. It is also 
required that PALcode not be exited, even via a mispredicted path until this 
HW_MTPR ITB_IA has been retired. PALmode can change temporarily after fetching 
a HW_RET, regardless of the STALL qualifier, down a mispredicted path leading to use 
of the ITB before it is actually initialized. 


Unexpected instruction fetch and execution can occur following misprediction of any 
memory format control instruction (JMP, JSR, RET, JSR_CO, or HW_JMP, HW_JSR, 
HW_RET, HW_JSR_CO regardless of the STALL qualifier), or after any mispredicted 
conditional branch instruction. If the unexpected instruction flow contains a HW_RET 
instruction, PALmode may be exited prematurely. 


One way to ensure that PALmode is not exited is to place the HW_MTPR ITB_IA at 
least 80 instructions before any possible HW_RET instruction can be encountered via 
any fetch path. Since memory format control instructions can mispredict to any cache 
location, they should also be avoided within these 80 instructions. 


D.22 Guideline 26: Conditional Branches in PALcode 


To avoid pollution of the branch predictors and improve overall branch prediction accu- 
racy, conditional branch instructions in PALcode will be predicted to not be taken. The 
only exception to this rule are conditional branches within the first cache fetch (up to 
four instructions) of all PALcode flows except CALL_PAL flows. Conditional branches 
should be avoided in this window. 
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Restriction 27: Reset of ‘Force-Fail Lock Flag’ State in PALcode 


D.23 Restriction 27: Reset of ‘Force-Fail Lock Flag’ State in PALcode 


A virtual mode load or store is required in PALcode before the execution of any load- 
locked or store-conditional instructions. The virtual-mode load or store may not be a 
HW_LD, HW_ST, LDx_L, ECB, or WH64. 


D.24 Restriction 28: Enforce Ordering Between IPRs Implicitly Writ- 
ten by Loads and Subsequent Loads 


Certain IPRs, which are updated as a result of faulting memory operations, require soft- 
ware assistance to maintain ordering against newer instructions. Consider the following 
code sequence: 


HW_MFPR IPR_MM STAT 
LDQ rx, (ry) 


These instructions would typically be issued in-order. The HW_MFPR is data-ready 
and both instructions use a lower subcluster. However, the HW_MFPRs (and 
HW_MTPRs) respond to certain resource-busy indications and are not issued when the 
Mbox informs the Ibox that a certain set of resources (store-bubbles) are busy. The LDs 
respond to a different set of resource-busy indications (load-bubbles) and could be 
issued around the HW_MEFPR in the presence of the former. Software assistance is 
required to enforce the issue order. One sure way to enforce the issue order is to insert 
an MB instruction before the first load that occurs after the HW_MFPR MM_STAT. 
The VA, VA_FORM, and DC_CTL registers require a similar constraint. All LOAD 
instructions except HW_LD might modify any or all of these registers. HW_LD does 
not modify MM_STAT. 


D.25 Guideline 29 : JSR, JMP, RET, and JSR_COR in PALcode 


Unprivileged JSR, JMP, RET, and JSR_COR instructions will always mispredict when 
used in PALcode. In addition, HW_RET to a PALmode target will always mispredict 
since the JSR stack only predicts native-mode return addresses. HW_RET to a native- 
mode target uses the JSR stack for prediction and should usually be used when exiting 
PALmode in order to maintain JSR stack alignment since all PALmode traps also push 
the value of the EXC_ADDR on the JSR stack. 


Privileged versions of the JSR type instructions (HW_JSR,HW_JMP,HW_JSR_COR) 
can be used both within PALmode or to exit PALmode and generate a predicted target 
based on their hint bits and the current processor PALmode siate. 


D.26 Restriction 30 : HW_MTPR and HW_MFPR to the Cbox CSR 


External bus activity must be isolated from writes and reads to the Cbox CSR. This 
requires that all Dstream and Istream fills must be avoided until after the HW_MTPR/ 
HW_MFPR updates are completed. An MB instruction can block Dstream activity, but 
blocking all Istream fills, including prefetches, requires more extensive code. The fol- 
lowing code example blocks all Istream fill requests and stall instruction fetch until 
after the desired MTPR/MFPR action is completed. This code assumes that Istream 
prefetching has already been disabled by way of a HW_MTPR to I_CTL[SBE], 
IC_FLUSH, and HW_RET_STALL sequence. 
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Restriction 31 : |_CTL[VA_48] Update 


;Macro to stop Ibox from sending requests to Cbox 
;Replace i1-i6 with instruction(s) you want done with Ibox quiet 
;This macro is assumed to be used in PALmode 


. 
t 


-macro stop_ibox il=nop, i2=nop, i3=nop, i4=nop, iS=nop, - 


start: 


i6=nop, ?touchl, ?touch2, ?touch3, ?overl, ?over2, ?start, ?end 


touch1: 


overl: 


touch2; 


over2: 


-align 6, Ox47f£f041f ;Next 16 instructions must be cache block 
; aligned, fill with BIS NOP 
br31, touchl ;Skip around and pull next 4 block into Icache 
il 
i2 
br r31, overl ;Skip over touch stuff 
br r31, touch2 ;Skip around and pull next block into Icache 
i3 
i4 
br x31, over2 
br x31, touch3 ;Skip around and pull next block into Icache 
i5 
i6 
hw_mtpr r31, <0x0610> ;Dummy IPR write - sets SCBD bit 4 
hw_mtpr r31, <0x0610> ;Stalled until above writes retire 
beq r31, end ;Predicts fall through since in PALmode 
br r31, .-4 ;fools Ibox predictor into infinite loop 


touch3: 


end: 
-endm 


br r31, start 


D.27 Restriction 31 : | CTL[VA_48] Update 


The VA_48 virtual address format cannot be changed while executing a JSR, JMP, 
GOTO, JSR_COROUTINE, or HW_RET instruction. A simple method of ensuring 
that the address does not change is to write I_CTL twice, in two separate fetch blocks, 
with the same data. The second write will stall the pipeline and ensure that the mode 
cannot change, even down a mispredicted path, while a following JSR type instruction 
might be using the address comparison logic. 


D.28 Restriction 32 : PCTR_CTL Update 


The performance counter must not be left in a state near overflow. If counting is dis- 
abled, the counters may produce multiple overflow signals if the counter output is not 
updated due to the counter being disabled. A repeated overflow signal with counters 
disabled can block other incoming interrupt requests while the overflow state persists. 
To avoid this situation, reads or writes to the counters should not leave a value near 
overflow. In normal operation, with counters enabled, a counter overflow will produce 
an overflow pulse, clear the counter, and produce a performance counter interrupt. 
Interrupts can only be blocked for one cycle. 
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Restriction 33 : HW_LD Physical/Lock Use 


D.29 Restriction 33 : HW_LD Physical/Lock Use 


The HW_LD physical/lock instruction must be one of the first three instructions in a 
quad-instruction aligned fetch block. A pipeline error can occur if the HW_LD physi- 
cal/lock is fetched as the fourth instruction of the fetch block. 


D.30 Restriction 34 : Writing Multiple ITB Entries in the Same PAL- 
code Flow 


Before a PALcode flow writes multiple ITB entries, additional scoreboard bits should 
be set to avoid possible corruption of the TAG IPR prior to final update in the ITB. The 
addition of scoreboard bits 0 and 4 to the standard scoreboard bit 6 for ITB_TAG will 
prevent subsequent HW_MTPR ITB_TAG writes from changing the staging register 
TAG value prior to retirement of the HW_MTPR ITB_PTE that triggers the final ITB 
update. 


D.31 Guideline 35: HW_INT_CLR Update 


When writing the HW_INT_CLR IPR to clear interrupt requests, it may be necessary to 
write the same value twice in distinct fetch blocks to ensure that the interrupt request is 
cleared before exiting PALcode. A second write will cause a scoreboard stall until the 
first write retires, creating a convenient synchronization with the PALmode exit. 


D.32 Restriction 36 : Updating |_CTL[SDE] 


A software interlock is required between updates of the I CTL[SDE] and a subsequent 
instruction fetch that may use any destination registers. A suggested method of ensuring 
this interlock is to use two MTPR I_CTL instructions in separate fetch blocks, followed 
by three more fetch blocks of non-NOP instructions. 


D.33 Restriction 37 : Updating VA_CTL[VA_48] 


A software interlock is required between updates of the VA_CTL[VA_48] and follow- 
ing LD or ST instructions. This is necessary since the VA_CTL update will not occur 
until the HW_MTPR VA_CTL instruction retires. A sufficient method of ensuring this 
interlock is to write the VA_CTL with the same data in two successive fetch blocks, 
causing a mapper stall. The dependant LD or ST instructions can be placed in any loca- 
tion of the second fetch block. 


D.34 Restriction 38 : Updating PCTR_CTL 


When updating the PCTR_CTL, it may be necessary to write the update value twice. If 
the counter being updated is currently disabled by way of the respective I_CTL or 
PCTX bits, the value must be written twice to ensure that the counter overflow is prop- 
erly cleared. The overflow bit is conditionally latched using the same write enable as 
the counter update, so an additional write of the counter value will ensure that the over- 
flow logic accurately reflects the addition of the new counter value plus the input condi- 
tions. The new update value must not be within one cycle of overflow (within 16 for 
SLO, within 4 for SL1) as required by Section D.28. 
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Guideline 39: Writing Multiple DTB Entries in the Same PAL Flow 


D.35 Guideline 39: Writing Multiple DTB Entries in the Same PAL 
Flow 


If a PALcode flow intends to write multiple DTB entries (as would occur in a double 
miss), it must take care to keep subsequent HW_MTPR DTB_TAGx writes from cor- 
rupting the staging register TAG values prior to retirement of the HW_MTPR 
DTB_PTEx, which triggers the final DTB update. 


For example, in the double miss DTB flow, the following code could be used to hold up 
the return to the single miss flow (the numbers in parentheses are the scoreboard bits): 


hw_mtpr r4, EV6__DTB_TAGO ; (2&6) write tag0 
hw_mtpr r4, EV6__DITB TAG1 ; (1&5) write tag 1 
hw_mtpr r5, EV6__DTB PTEO ; (0&4) write pted 
hw_mtpr r5, EV6__DIB_PTE1 i; (3&7) write ptel 
bis r31, r31, r31 ; force new fetch block 


bis: x31, r31, 231 
bis x31, r31, r31 
hw_mtpr r31, <EV6__MM STAT ! “*x80> ; (7) wait for pte write 


hw_ret (r6) ; return to single miss 


D.36 Restriction 40: Scrubbing a Single-Bit Error 


On Bcache and Memory single bit errors on Icache fills, the hardware flushes the 
Icache, but the PALcode must scrub the block in the Bcache and memory. On Bcache 
and Memory single bit errors on Dcache fills, the hardware scrubs the Dcache as long 
as the error was on a target quadword, but the PALcode must scrub the Dcache for non- 
target quadwords, and must in general scrub the block in the Bcache and memory. 


The scrub consists of reading each quadword in the block, with at least one exclusive 
access load/store to ensure the corrected data will be scrubbed in Bcache and memory. 
The scrub itself causes a CRD to be flagged, which is cleared by the PALcode before 
exiting to native mode. 


; Sample code for scrubbing a single bit error. 


Since we only have the block address, and the hardware only corrects 
7 target quadwords, we read each quadword. 
In order to ensure eviction to bcache and memory, a store 
is needed to mark the block dirty. An exclusive access is 
used to ensure we scrub in main memory. Virtual access is 
used because of restrictions in use of hw_ld/hw_st lock 
; instructions. 
; After the scrub, read the cbox chain again. 
The scrub will cause a crd, but will get cleared with a write 
; to hw_int_clr. 


; Current state: 


: r5 base of crd logout frame 
hw_ldq/p r4, MCHK_CRD__C_ADDR(r5) ; get address back 
bis r31, r31, r31 
bis r31, r31, r31 
bis r31, r31, r31 
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hw_mtpr 
lda 
bis 
bis 


hw_mtpr 
srl 
sll 
bis 


hw_mtpr 
hw_mtpr 
hw_mtpr 
hw_mtpr 


bis 
bis 
bis 


ldq 
ldgq 
ldq 
ldq 
1dq 
ldq 
ldq 


ldq_1l 
stq_c 
mb 
and 


beq 
br 


bsxr 
bis 


hw_mtpr 
bis 
bis 
bis 


hw_mtpr 
bis 
sll 
hw_mtpr 


ida 
hw_mtpr 
bis 
bis 


hw_mtpr 


Restriction 40: Scrubbing a Single-Bit Error 


r31, 
r20, 
r31, 
r31, 


EV6__DTB_IA 
“x3301(r31) 
x31, r31 
r31, r3l 


r31, 
r4, 
r6, 
xr6, 


<EV6__MM STAT ! “x80> 
#13, r6 
#EV6__DTB_ PTEO PFN S, r6 
r20, x6 


r4, 
r4, 
r6, 
x6, 


EV6__DTB_'TAGO 
EV6__DTB_TAG1 
EV6__DTB_PTEO 
EV6__DTB_PTE1 


r31, r31, r31 
r31, r31, r3l 
r31, r31, r3l1 


“x00 (r4) 
“x08 (r4) 
“x10 (r4) 
“x18 (r4) 
“x20 (r4) 
“x28 (r4) 
“x30 (r4) 


r6, 
r6, 
r6, 
ré, 
r6, 
r6, 
r6, 


“x38 (r4) 
“x38 (r4) 


r6, 
r6é, 
r6, x31, r6 


sys__crd_scrub_done 
.-4 


r6, 
r31, 


sys__crd_scrub_done: 


r7, sys__cbox 
r31, r31, r31 


r31, 
x31, 
r31, 
r31, 


Ev6__DTB_IA 
x31, r31 
r31, r31 
r31, r31 


r31, 
r31, 
r7, 
x7, 


<EV6__MM_ STAT ! “x80> 

#1, x7 
#EV6__HW_INT CLR CRS, x7 
EV6__HW_INT_CLR 


r7, EV6__DC_STAT_W1C_CRD(r31) 
r7, EV6__DC_STAT 

r31, r31 ,r31 

r31, r31 ,r31 


r31, <EV6__MM_STAT ! *x50> 


me 


=e 


. 
s 


=e fe fe 


=e 


. ~ =e Me 


” 


~e “oe 


=e 


os 


=e 


. 
’ 


=e 


a 


“ee 


(7,14) flush datb 
set WE, RE 


wait for retire 
shift byte offset 
shift into position 
produce pte 


(2&6, OL) 
(1&5, 11) 
(0&4, 0L) 
(3&7, 1L) 


write tag0 
write tagl 
write pted 
write ptel 


; quiet before we start 


re-read the bad block Qw #0 
re-read the bad block Qw #1 
re-read the bad block QW #2 


; re-read the bad block QW #3 
; re-read the bad block Qw #4 
; re-read the bad block Qw #5 


re-read the bad block Qw #6 
no other mem-ops till done 


re-read the bad block QW #7 
now store it to force scrub 


; consumer of above 


; these 2 lines...... 


rrr stop pre-fetching 


; clean the cbhox error chain 


(7,1L) flush datb 


wait for retire 
getal 

shift into position 
(4,0L) clear crd 


; W1c bits 


(6,0L) 


; stall till they retire 
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Forwarding Clock Pin Groupings 


E 


21264A-to-Bcache Pin Interconnections 





This appendix provides the pin interface between the 21264A and Bcache SSRAMs. 


E.1 Forwarding Clock Pin Groupings 


Table E-—1 lists the correspondance between the clock signals for the 21264A and 
Bceache (late-write non-bursting and dual-data rate) SSRAMs. 


Table E-1 Bcache Forwarding Clock Pin Groupings 


Pad and Pin 
BcData_H[71:64,7:0] 
BcCheck_H[8,0] 
BceData_H[79:72,15:8] 
BcCheck_H[9,1] 
BcData_H[87:80,23:16] 
BceCheck_H[10,2] 
BcData_H[95:88,31:24] 
BcCheck_Hf[11,3] 
BcData_H[103:96,39:32] 
BcCheck_H[12,4] 
BcData_H[111:104,47:40] 
BceCheck_H[13,5] 
BceData_H[119:112,55:48] 
BcCheck_H[14.6] 
BcData_H[127:120,63:56] 
BcCheck_H[15,7] 
BceTag_H[42:20] 
BcTagParity_H 
BcTagShared_H 
BcTagDirty_H 
BcTagValid_H 


Input Clock 
BcDataInClk_H/[0] 
BcDataInCik_H[0] 
BcDataInClk_H[1] 
BcDataInClk_H[1] 
BcDataInClk_H[2] 
BcDataInClk_H[2] 
BcDataInCik_H[3] 
BcDataInClk_H[3] 
BcDataInClk_H[4] 
BcDataInClk_H[4] 
BcDataInClk_H[5] 
BcDataInCik_H[5] 
BcDataInClk_H[6] 
BcDataInCik_H[6] 
BcDataInClk_H[7] 
BcDataInClk_H[7] 
BcTagInCik_H 
BcTagInClk_H 
BcTagInClk_H 
BeTagInClk_H 
BcTagInClk_H 
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21264A-to-Bcache Pin Interconnections 


Output Clocks 
BcDataOutClk_x[0] 
BcDataOutClk_x[0] 
BcDataOutClk_x[0} 
BceDataOutClk_x[0] 
BcDataOutClk_x[1] 
BcDataOutClk_x[1] 
BcDataOutCik_x[1] 
BcDataOutClk_x[1]} 
BcDataOutClk_x[2] 
BcDataOutClk_x[2] 
BcDataOutClk_x[2] 
BcDataOutClk_x[2] 
BcDataOutClk_x[3] 
BcDataOutClk_x[3] 
BcDataOutClk_x[3] 
BcDataOutClk_x[3] 
BcTagOutClk_x 
BcTagOutClk_x 
BcTagOutClk_x 
BcTagOutClk_x 
BcTagOutClk_x 


E-1 


Late-Write Non-Bursting SSRAMs 


E.2 Late-Write Non-Bursting SSRAMs 


Table E-2 provides the data pin connections between late-write non-bursting SSRAMs 
and the 21264A or the system board. Table E-3 provides the same information for the 


tag pins. 
Data Pin Usage 


Table E-2 Late-Write Non-Bursting SSRAMs Data Pin Usage 


21264A Signal Name or Board Connection 


Late-Write SSRAM Data Pin Name 





BcAdd_H[21:4] 

BcDataOutClk_H[3:0] 

Set from board to 1/2 the 21264A core voltage 
BcData_H[127:0]/BcCheck_H[15:0] 
BcDataWr_L 

Unconnected 

Unconnected 

Unconnected 

Unconnected 

From board, pull down toVSS 

From board, pull down toVSS 

From board, pull down toVSS or BcDataOE_L 


Tag Pin Usage 


SA_H[17:0] 
CK_H 
CK_L 

DQx 

SW_L 
Tck_H 
Tdo_H 
Tms_H 
Tdi_H 

G_L 

SBx_L 
SS_L (Vendor dependent) 


Unused Bcache tag pins should be pulled to ground through a 200-ohm resistor. 


Table E-3 Late-Write Non-Bursting SSRAMs Tag Pin Usage 





21264A Signal Name or Board Connection 


Late-Write SSRAM Tag Pin Name 





BcAdd_H[22:6] 

BcTag_H[42:20] 

BceTagOE_L or from board, pull down toVSS 
BcTagWr_L 

From board, pull down toVSS 
BcTagOutClk_H 

Set from board to 1/2 the 21264A core voltage 
Set from board to 1/2 the 21264A core voltage 


Set from board (implementation dependent) 
BcTagValid_H 
BcTagDirty_H 


Compaq Confidential 


SA_H[16:0] 

DQx 

SS_L (Vendor dependent) 
SW_L 

SBx_L 

CK_H 

CK_L 


VREF1_H 
VREF2_H 


ZQ_.H 
DQx 
DQx 
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Dual-Data Rate SSRAMs 


Table E-3 Late-Write Non-Bursting SSRAMs Tag Pin Usage (Continued) 


21264A Signal Name or Board Connection 


Late-Write SSRAM Tag Pin Name 





BcTagShared_H 
Unconnected 
Unconnected 
Unconnected 


Unconnected 


E.3 Dual-Data Rate SSRAMs 


DQx 
TMS_H 
TDI_LH 
TCK_H 
TDC_H 


Table E4 provides the data pin connections between dual-data rate SSRAMs and the 
21264A or the system board. Table E—5 provides the same information for the tag pins. 


Data Pin Usage 


Table E-4 Dual-Data Rate SSRAM Data Pin Usage 


21264A Signal Name or Board Connection 


Dual-Data Rate SSRAM Data Pin Name 


BcAdd_H[21:4] SA_H[17:0] 
BcData_H[33:20)/ DQx 
BcCheck_H[15:0] 
BcLoad_L LD_L (Bl) 
BcDataWr_L R/W_L(B2) 
From board, pulled up to VDD LBO_L 
From board, pulled down to VSS QL 
BcDataInClk_H CQ_H 
BcDataOutClk_H CK_H 
BcDataOutClk_L CK_L 
Set from board to 1/2 the 21264A core voltage WVREF1_H 
VREF2_H 
Set from board (implementation-dependent) ZQ_H 
Unconnected or terminated CQL 
From board, pulled up toVDD TCK_H 
Unconnected TDO_H 
From board, pulled up to VDD TMS_H 
From board, pulled up to VDD TDI_LH 
Unconnected or pulled down to VSS TRST_L 
BcDataOE_L OE_L (G_L) 
From board, pulled down to VSS SD/DD_L (B3) 
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E-3 


Dual-Data Rate SSRAMs 


Tag Pin Usage 


Unused Beache tag pins should be pulled to ground through a 200-ohm resistor. 


Table E-5 Dual-Data Rate SSRAMs Tag Pin Usage 


21264A Signal Name or Board Connection 


BcAdd_H[23:6] 
BcTag_H[33:20] 

BcTagOE_L 

BcTagWr_L 

From board, pulled up to VDD 
From board, pulled down to VSS 


BcTagInClk_H 
BcTagOutClk_H 
BcTagOutClik_L 


Set from board to 1/2 core voltage 


Set from board (implementation-dependent) 
BcTag Valid_H 

BcTagDirty_H 
BcTagShared_H 
BcTagParity_H 

Unconnected or terminated 

From board, pulled up to VDD 
Unconnected 

From board, pulled up to VDD 
From board, pulled up to VDD 
Unconnected 

From board, pulled down to VSS 
From board, pulled up to VDD 


Dual-Data Rate SSRAM Tag Pin Name 
SA_H[17:0] 

DQx 

LD_L (Bl) 

R/W_L (B2) 

LBO_L 


QL 
SA[19:18] 


CQ_.H 
CK_H 
CK_L 


VREF1_H 
VREF2_H 


ZQ_.H 

DQx« 

DQx 

DQx 

DQx 

CQ_L 
TCK_H 
TDO_H 
TMS_H 
TDI_H 
TRST_L 
OE_L (G_L) 
SD/DD_L (B3) 
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Glossary 


This glossary provides definitions for specific terms and acronyms associated with the 
Alpha 21264A microprocessor and chips in general. 


abort 


The unit stops the operation it is performing, without saving status, to perform some 
other operation. 


address space number (ASN) 


An optionally implemented register used to reduce the need for invalidation of cached 
address translations for process-specific addresses when a context switch occurs. ASNs 
are processor specific; the hardware makes no attempt to maintain coherency across 
multiple processors. 


address translation 


The process of mapping addresses from one address space to another. 


ALIGNED 
A datum of size 2**N is stored in memory at a byte address that is a multiple of 2**N 
(that is, one that has N low-order zeros). 
ALU 
Arithmetic logic unit. 
ANSI 
American National Standards Institute. An organization that develops and publishes 
standards for the computer industry. 
ASIC 
Application-specific integrated circuit. 
ASM 
Address space match. 
ASN 
See address space number. 
assert 
To cause a signal to change to its logical true state. 
AST 


See asynchronous system trap. 
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asynchronous system trap (AST) 


A software-simulated interrupt to a user-defined routine. ASTs enable a user process to 
be notified asynchronously, with respect to that process, of the occurrence of a specific 
event. If a user process has defined an AST routine for an event, the system interrupts 
the process and executes the AST routine when that event occurs. When the AST rou- 
tine exits, the system resumes execution of the process at the point where it was inter- 
rupted. 


bandwidth 
Bandwidth is often used to express the rate of data transfer in a bus or an I/O channel. 
barrier transaction 


A transaction on the external interface as a result of an MB (memory barrier) instruc- 


tion. 

Bcache 
See second-level cache. 

bidirectional 
Flowing in two directions. The buses are bidirectional; they carry both input and output 
signals. 

--BiSl 

Built-in self-initialization. 

BiST 
Built-in self-test. 

bit 
Binary digit. The smallest unit of data in a binary notation system, designated as 0 or 1. 

bit time 
The total time that a signal conveys a single valid piece of information (specified in ns). 
All data and commands are associated with a clock and the receiver’s latch on both the 
rise and fall of the clock. Bit times are a multiple of the 21264A clocks. Systems must 
produce a bit time identical to 21264A’s bit time. The bit time is one-half the period of 
the forwarding clock. 

BIU 
Bus interface unit. See Cbox. 

Block exchange 


Memory feature that improves bus bandwidth by paralleling a cache victim write-back 
with a cache miss fill. 


board-level cache 


See second-level cache. 
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boot 


Short for bootstrap. Loading an operating system into memory is called booting. 


BSR 
Boundary-scan register. 

buffer 
An internal memory area used for temporary storage of data records during input or 
output operations. 

bugcheck 
A software condition, usually the response to software’s detection of an “internal incon- 
sistency,” which results in the execution of the system bugcheck code. 

bus 
A group of signals that consists of many transmission lines or wires. It interconnects 
computer system components to provide communications paths for addresses, data, and 
control information. 

byte 
Eight contiguous bits starting on an addressable byte boundary. The bits are numbered 
right to left, O through 7. 

byte granularity 
Memory systems are said to have byte granularity if adjacent bytes can be written con- 
currently and independently by different processes or processors. 

cache 


See cache memory. 
cache block 


The smallest unit of storage that can be allocated or manipulated in a cache. Also 
known as a cache line. 


cache coherence 


Maintaining cache coherence requires that when a processor accesses data cached in 
another processor, it must not receive incorrect data and when cached data is modified, 
all other processors that access that data receive modified data. Schemes for maintain- 
ing consistency can be implemented in hardware or software. Also called cache consis- 
tency. 


cache fill 


An operation that loads an entire cache block by using multiple read cycles from main 
memory. 


cache flush 


An operation that marks all cache blocks as invalid. 
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cache hit 


The status returned when a logic unit probes a cache memory and finds a valid cache 
entry at the probed address. 


cache interference 


The result of an operation that adversely affects the mechanisms and procedures used to 
keep frequently used items in a cache. Such interference may cause frequently used 
items to be removed from a cache or incur significant overhead operations to ensure 
correct results. Either action hampers performance. 


cache line 

See cache block. 
cache line buffer 

A buffer used to store a block of cache memory. 
cache memory 


A small, high-speed memory placed between slower main memory and the processor. A 
cache increases effective memory transfer rates and processor speed. It contains copies 
of data recently used by the processor and fetches several bytes of data from memory in 
anticipation that the processor will access the next sequential series of bytes. The 
21264A microprocessor contains two onchip internal caches. See also write-through 
cache and write-back cache. 


cache miss 


The status returned when cache memory is probed with no valid cache entry at the 
probed address. 


CALL_PAL instructions 
Special instructions used to invoke PALcode. 
Cbox 


External cache and system interface unit. Controls the Bcache and the system ports. 


central processing unit (CPU) 


The unit of the computer that is responsible for interpreting and executing instructions. 


CISC 
Complex instruction set computing. An instruction set that consists of a large number 
of complex instructions. Contrast with RISC. 

clean 
In the cache of a system bus node, refers to a cache line that is valid but has not been 
written. 

clock 


A signal used to synchronize the circuits in a computer. 
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clock offset (or clkoffset) 


The delay intentionally added to the forwarded clock to meet the setup and hold 
requirements at the Receive Flop. 


CMOS 


Complementary metal-oxide semiconductor. A silicon device formed by a process that 
combines PMOS and NMOS semiconductor material. 


conditional branch instructions 


Instructions that test a register for positive/negative or for zero/nonzero. They can also 
test integer registers for even/odd. 


control and status register (CSR) 


A device or controller register that resides in the processor’s I/O space. The CSR ini- 
tiates device activity and records its status. 


CPI 
Cycles per instruction. 
CPU 
See central processing unit. 
CSR 
See control and status register. 
cycle 
One clock interval. 
data bus 
A group of wires that carry data. 
Dcache 
Data cache. A cache reserved for storage of data. The Dcache does not contain instruc- 
tions. 
DDR 
Dual-data rate. A dual-data rate SSRAM can provide data on both the rising and falling 
edges of the clock signal. 
denormal 
An IEEE floating-point bit pattern that represents a number whose magnitude lies 
between zero and the smallest finite number. 
DIP 


Dual inline package. 
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direct-mapping cache 


A cache organization in which only one address comparison is needed to locate any 
data in the cache, because any block of main memory data can be placed in only one 
possible position in the cache. 


direct memory access (DMA) 


dirty 


dirty victim 


DMA 


DRAM 


DTB 


DTL 


dual issue 


ECC 


ECC error 


ECL 


EEPROM 
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Access to memory by an I/O device that does not require processor intervention. 


One status item for a cache block. The cache block is valid and has been written so that 
it may differ from the copy in system main memory. 


Used in reference to a cache block in the cache of a system bus node. The cache block 
is valid but is about to be replaced due to a cache block resource conflict. The data must 
therefore be written to memory. 


See direct memory access. 


Dynamic random-access memory. Read/write memory that must be refreshed (read 
from or written to) periodically to maintain the storage of information. 


Data translation buffer. Also defined as Dstream translation buffer. 
Diode-transistor logic. 


Two instructions are issued, in parallel, during the same microprocessor cycle. The 
instructions use different resources and so do not conflict. 


Error correction code. Code and algorithms used by logic to facilitate error detection 
and correction. See also ECC error. 


An error detected by ECC logic, to indicate that data (or the protected “entity”) has 
been corrupted. The error may be correctable (soft error) or uncorrectable (hard error). 


Emitter-coupled logic. 


Electrically erasable programmable read-only memory. A memory device that can be 
byte-erased, written to, and read from. Contrast with FEPROM. 
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external cache 


See second-level cache. 


FEPROM 
Flash-erasable programmable read-only memory. FEPROMs can be bank- or bulk- 
erased. Contrast with EEPROM. 
FET 
Field-effect transistor. 
FEU 
The unit within the 21264A microprocessor that performs floating-point calculations. 
firmware 


Machine instructions stored in nonvolatile memory. 
floating point 


A number system in which the position of the radix point is indicated by the exponent 
part and another part represents the significant digits or fractional part. 


flush 
See cache flush. 
forwarded clock 


A single-ended differential signal that is aligned with its associated fields. The for- 
warded clock is sourced and aligned by the sender with a period that is two times the bit 
time. Forwarded clocks must be 50% duty cycle clocks whose rising and falling edges 
are aligned with the changing edge of the data. 


FPGA 

Field-programmable gate array. 
FPLA 

Field-programmable logic array. 
FQ 


Floating-point issue queue. 
framing clock 


The framing clock defines the start of a transmission either from the system to the 
21264A or from the 21264A to the system. The framing clock is a power-of-2 multiple 
of the 21264A GCLK frequency, and is usually the system clock. The framing clock 
and the input oscillator can have the same frequency. The add_frame_select IPR sets 
that ratio of bit times to framing clock. The frame clock could have a period that is four 
times the bit time with a add_frame_select of 2X. Transfers begin on the rising and 
falling edge of the frame clock. This is useful for systems that have system clocks with 
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a period too small to perform the synchronous reset of the clock forward logic. Addi- 
tionally, the framing clock can have a period that is less than, equal to, or greater than 
the time it takes to send a full four cycle command/address. 


GCLK 
Global clock within the 21264A. 


granularity 


A characteristic of storage systems that defines the amount of data that can be read and/ 
or written with a single instruction, or read and/or written independently. 


hardware interrupt request (HIR) 
An interrupt generated by a peripheral device. 
high-impedance state 


An electrical state of high resistance to current flow, which makes the device appear not 
physically connected to the circuit. 


hit 
See cache hit. 


icache 


Instruction cache. A cache reserved for storage of instructions. One of the three areas of 
primary cache (located on the 21264A) used to store instructions. The Icache contains 
8KB of memory space. It is a direct-mapped cache. Icache blocks, or lines, contain 32 
bytes of instruction stream data with associated tag as well as a 6-bit ASM field and an 
8-bit branch history field per block. Icache does not contain hardware for maintaining 
cache coherency with memory and is unaffected by the invalidate bus. 

IDU 
A logic unit within the 21264A microprocessor that fetches, decodes, and issues 
instructions. It also controls the microprocessor pipeline. 

IEEE Standard 754 
A set of formats and operations that apply to floating-point numbers. The formats cover 
32-, 64-, and 80-bit operand sizes. 

IEEE Standard 1149.1 
A standard for the Test Access Port and Boundary Scan Architecture used in board- 
level manufacturing test procedures. 


Inf 
Infinity. 


INT nn 
The term INTnn, where nn is one of 2, 4, 8, 16, 32, or 64, refers to a data field size of nn 
contiguous NATURALLY ALIGNED bytes. For example, INT4 refers to a NATU- 
RALLY ALIGNED longword. 
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interface reset 


A synchronously received reset signal that is used to preset and start the clock forward- 
ing circuitry. During this reset, all forwarded clocks are stopped and the presettable 
count values are applied to the counters; then, some number of cycles later, the clocks 
are enabled and are free running. 


Internal processor register (IPR) 


Special registers that are used to configure options or report status. 


IOWB 

1/O write buffer. 
IPGA 

Interstitial pin grid array. 
iQ 

Integer issue queue. 
ITB 

Instruction translation buffer. 
JFET 

Junction field-effect transistor. 
latency 

The amount of time it takes the system to respond to an event. 
LCC 

Leadless chip carrier. 
LFSR 


Linear feedback shift register. 
load/store architecture 


A characteristic of a machine architecture where data items are first loaded into a pro- 
cessor register, operated on, and then stored back to memory. No operations on memory 
other than load and store are provided by the instruction set. 


longword (LW) 


Four contiguous bytes starting on an arbitrary byte boundary. The bits are numbered 
from right to left, 0 through 31. 


LQ 
Load queue. 


LSB 
Least significant bit. 
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machine check 


MAF 


An operating system action triggered by certain system hardware-detected errors that 
can be fatal to system operation. Once triggered, machine check handler software ana- 
lyzes the error. 


Miss address file. 


main memory 


The large memory, external to the microprocessor, used for holding most instruction 
code and data. Usually built from cost-effective DRAM memory chips. May be used in 
connection with the microprocessor’s internal caches and an external cache. 


masked write 


MBO 


Mbox 


MBZ 


A write cycle that only updates a subset of a nominal data block. © 


See must be one. 


This section of the processor unit performs address translation, interfaces to the 
Deache, and performs several other functions. 


See must be zero. 


MESI protocol 


MIPS 


miss 


module 


A cache consistency protocol with full support for multiprocessing. The MESI protocol 
consists of four states that define whether a block is modified (M), exclusive (E), shared 
(S), or invalid (1). 


Millions of instructions per second. 


See cache miss. 


A board on which logic devices (such as transistors, resistors, and memory chips) are 
mounted and connected to perform a specific system function. 


module-level cache 


MOS 


MOSFET 
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See second-level cache. 


Metal-oxide semiconductor. 


Metal-oxide semiconductor field-effect transistor. 
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MSI 
Medium-scale integration. 
multiprocessing 


A processing method that replicates the sequential computer and interconnects the col- 
lection so that each processor can execute the same or a different program at the same 


time. 
must be one (MBO) 
A field that must be supplied as one. 
must be zero (MBZ) 
A field that is reserved and must be supplied as zero. If examined, it must be assumed to 
be UNDEFINED. 
NaN 
Not-a-Number. An IEEE floating-point bit pattern that represents something other than 
a number. This comes in two forms: signaling NaNs (for Alpha, those with an initial 
fraction bit of 0) and quiet NaNs (for Alpha, those with an initial fraction bit of 1). 
NATURALLY ALIGNED 
See ALIGNED. 
NATURALLY ALIGNED data 
Data stored in memory such that the address of the data is evenly divisible by the size of 
the data in bytes. For example, an ALIGNED longword is stored such that the address 
of the longword is evenly divisible by 4. 
NMOS 
N-type metal-oxide semiconductor. 
NVRAM 
Nonvolatile random-access memory. 
OBL 
Observability linear feedback shift register. 
octaword 


Sixteen contiguous bytes starting on an arbitrary byte boundary. The bits are numbered 
from right to left, 0 through 127. 


OpenVMS Alpha operating system 
The version of the open VMS operating system for Alpha platforms. 
operand 


The data or register upon which an operation is performed. 
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output mux counter 


PAL 


PALcode 


PALmode 


parameter 


parity 


PGA 


pipeline 


PLA 


PLCC 


PLD 


PLL 


PMOS 


PQ 
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Counter used to select the output mux that drives address and data. It is reset with the 
Interface Reset and incremented by a copy of the locally generated forwarded clock. 


Privileged architecture library. See also PALcode. See also Programmable array logic 
(hardware). A device that can be programmed by a process that blows individual fuses 
to create a circuit. 


Alpha privileged architecture library code, written to support Alpha microprocessors. 
PALcode implements architecturally defined behavior. 


A special environment for running PALcode routines. 


A variable that is given a specific value that is passed to a program before execution. 


A method for checking the accuracy of data by calculating the sum of the number of 
ones in a piece of binary data. Even parity requires the correct sum to be an even num- 
ber, odd parity requires the correct sum to be an odd number. 


Pin grid array. 


A CPU design technique whereby multiple instructions are simultaneously overlapped 
in execution. 


Programmable logic array. 


Plastic leadless chip carrier or plastic-leaded chip carrier. 


Programmable logic device. 


Phase-locked loop. 


P-type metal-oxide semiconductor. 


Probe queue. 
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PQFP 
Plastic quad flat pack. 
primary cache 


The cache that is the fastest and closest to the processor. The first-level caches, located 
on the CPU chip, composed of the Dcache and Icache. 


program counter 


That portion of the CPU that contains the virtual address of the next instruction to be 
executed. Most current CPUs implement the program counter (PC) as a register. This 
register may be visible to the programmer through the instruction set. 


PROM 

Programmable read-only memory. 
pull-down resistor 

A resistor placed between a signal line and a negative voltage. 
pull-up resistor 

A resistor placed between a signal line to a positive voltage. 
QNan 

Quiet Nan. See NaN. 
quad issue 


Four instructions are issued, in parallel, during the same microprocessor cycle. The 
instructions use different resources and so do not conflict. 


quadword 
Eight contiguous bytes starting on an arbitrary byte boundary. The bits are numbered 
from right to left, O through 63. 
RAM 
Random-access memory. 
RAS 
Row address select. 
RAW 


Read-after-write. 
READ_BLOCK 

A transaction where the 21264A requests that an external logic unit fetch read data. 
read data wrapping 


System feature that reduces apparent memory latency by allowing read data cycles to 
differ the usual low-to-high sequence. Requires cooperation between the 21264A and 
external hardware. 
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read stream buffers 


Arrangement whereby each memory module independently prefetches DRAM data 
prior to an actual read request for that data. Reduces average memory latency while 
improving total memory bandwidth. 


receive counter 


Counter used to enable the receive flops. It is clocked by the incoming forwarded clock 
and reset by the Interface Reset. 


receive mux counter 


The receive mux counter is preset to a selectable starting point and incremented by the 
locally generated forward clock. 


register 
A temporary storage or control location in hardware logic. 
reliability 


The probability a device or system will not fail to perform its intended functions during 
a specified time interval when operated under stated conditions. 


reset 
An action that causes a logic unit to interrupt the task it is performing and go to its ini- 
tialized state. 

RISC 
Reduced instruction set computing. A computer with an instruction set that is paired 
down and reduced in complexity so that most can be performed in a single processor 
cycle. High-level compilers synthesize the more complex, least frequently used instruc- 
tions by breaking them down into simpler instructions. This approach allows the RISC 
architecture to implement a small, hardware-assisted instruction set, thus eliminating 
the need for microcode. 

ROM 
Read-only memory. 

RTL 
Register-transfer logic. 

SAM 
Serial access memory. 

SBO 
Should be one. 

SBZ 
Should be zero. 

scheduling 


The process of ordering instruction execution to obtain optimum performance. 
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SDRAM 
Synchronous dynamic random-access memory. 
second-level cache 


A cache memory provided outside of the microprocessor chip, usually located on the 
same module. Also called board-level, external, or module-level cache. 


set-associative 


A form of cache organization in which the location of a data block in main memory 
constrains, but does not completely determine, its location in the cache. Set-associative 
organization is a compromise between direct-mapped organization, in which data from 
a given address in main memory has only one possible cache location, and fully asso- 
Ciative organization, in which data from anywhere in main memory can be put any- 
where in the cache. An “n-way set-associative” cache allows data from a given address 
in main memory to be cached in any of n locations. 


SIMM 

Single inline memory module. 
SIP 

Single inline package. 
SIPP 

Single inline pin package. 
SMD 

Surface mount device. 
SNaN 

Signaling NaN. See Nan. 
SRAM 

See SSRAM. 
SROM 

Serial read-only memory. 
SSI 

Small-scale integration. 
SSRAM 

Synchronous static random-access memory. 
stack 


An area of memory set aside for temporary data storage or for procedure and interrupt 
service linkages. A stack uses the last-in/first-out concept. As items are added to 
(pushed on) the stack, the stack pointer decrements. As items are retrieved from 
(popped off) the stack, the stack pointer increments. 


Compaq Confidential 
21264A Revision 1.1 — Subject To Change Glossary-15 


STRAM 
Self-timed random-access memory. 
superpipelined 


Describes a pipelined machine that has a larger number of pipe stages and more com- 
plex scheduling and control. See also pipeline. 


superscalar 


Describes a machine architecture that allows multiple independent instructions to be 
issued in parallel during a given clock cycle. 


system clock 


The primary skew controlled clock used throughout the interface components to clock 
transfer between ASICs, main memory, and I/O bridges. 


tag 


The part of a cache block that holds the address information used to determine ifa 
memory operation is a hit or a miss on that cache block. 


target clock 
Skew controlled clock that receives the output of the RECEIVE MUX. 


TB 
Translation buffer. 
tristate 
Refers to a bused line that has three states: high, low, and high-impedance. 
TTL 
Transistor-transistor logic. 
UART 


Universal asynchronous receiver-transmitter. 


UNALIGNED 


A datum of size 2**N stored at a byte address that is not a multiple of 2**N. 


unconditional branch instructions 


Instructions that change the flow of program control without regard to any condition. 
Contrast with conditional branch instructions. 


UNDEFINED 


An operation that may halt the processor or cause it to lose information. Only privileged 
software (that is, software running in kernel mode) can trigger an UNDEFINED opera- 
tion. (This meaning only applies when the word is written in all upper case.) 
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UNPREDICTABLE 


UVPROM 


VAF 


valid 


VDF 


VHSIC 


victim 


Results or occurrences that do not disrupt the basic operation of the processor; the pro- 
cessor continues to execute instructions in its normal manner. Privileged or unprivi- 
leged software can trigger UNPREDICTABLE results or occurrences. (This meaning 
only applies when the word is written in all upper case.) 


Ultraviolet (erasable) programmable read-only memory. 


See victim address file. 


Allocated. Valid cache blocks have been loaded with data and may return cache hits 
when accessed. 


See victim data file. 


Very-high-speed integrated circuit. 


Used in reference to a cache block in the cache of a system bus node. The cache block 
is valid but is about to be replaced due to a cache block resource conflict. 


victim address file 


The victim address file and the victim data file, together, form an 8-entry buffer used to 
hold information for transactions to the Bcache and main memory. 


victim data file 


The victim address file and the victim data file, together, form an 8-entry buffer used to 
hold information for transactions to the Bcache and main memory. 


virtual cache 


VLSI 


VPC 


VRAM 


A cache that is addressed with virtual addresses. The tag of the cache is a virtual 
address. This process allows direct addressing of the cache without having to go 
through the translation buffer making cache hit times faster. 


Very-large-scale integration. 


Virtual program counter. 


Video random-access memory. 
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WAR 
Write-after-read. 


word 
Two contiguous bytes (16 bits) starting on an arbitrary byte boundary. The bits are num- 
bered from right to left, 0 through 15. 

write-back 


A cache management technique in which write operation data is written into cache but 
is not written into main memory in the same operation. This may result in temporary 
differences between cache data and main memory data. Some logic unit must maintain 
coherency between cache and main memory. 


write-back cache 


Copies are kept of any data in the region; read and write operations may use the copies, 
and write operations use additional state to determine whether there are other copies to 
invalidate or update. 


WRITE_BLOCK 


A transaction where the 21264A requests that an external logic unit process write data. 


write data wrapping 


System feature that reduces apparent memory latency by allowing write data cycles to 
differ the usual low-to-high sequence. Requires cooperation between the 21264A and 
external hardware. 


write-through cache 


A cache management technique in which a write operation to cache also causes the 
same data to be written in main memory during the same operation. Copies are kept of 
any data in a region; read operations may use the copies, but write operations update the 
actual data location and either update or invalidate all copies. 
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Aggregate mode, 6-18 

Aligned convention, xx 

Alpha instruction summary, A-1 
AMASK instruction values, 2—37 
ARITH synchronous trap, 6-14 


B_DA_OD pin type, 3-3, 9-2 
values for, 9-4 

B_DA_PP pin type, 3-3, 9-2 
values for, 9-4 

BC_BANK_ENABLE Cbox CSR, 4—50, 5-39, 

7-12 

BC_BPHASE_LD_VECTOR Cbox CSR, 4-44 
defined, 5-38 

BC_BURST_MODE_ENABLE Cbox CSR, 4-50 
defined, 5-35 

BC_CLEAN_ VICTIM Cbox CSR, 4-22 
defined, 5-34 

BC_CLK_DELAY Cbox CSR, 4-44 
defined, 5-35 

BC_CLK_LD_VECTOR Cbox CSR, 4-44 
defined, 5—38 

BC_CLKFWD_ENABLE Cbox CSR, 4-46 
defined, 5-36 


Index 


BC_CLOCK_OUT Cbox CSR, 4—44 

BC_CPU_CLK_DELAY Cbox CSR, 4—43, 4-44 
defined, 5-38 

BC_CPU_LATE_WRITE_NUM Cbox CSR 
defined, 5-35 

BC_DDM_FALL_EN Cbox CSR, 446 
defined, 5-36 

BC_DDM_RISE_EN Cbox CSR, 4-46 
defined, 5-36 

BC_DDMF_ENABLE Cbox CSR, 446 
defined, 5-35 

BC_DDMR_ENABLE Cbox CSR, 446 
defined, 5-35 

BC_ENABLE Cbox CSR, 4-49, 5-39, 7-12 

BC_FDBK_EN Cbox CSR, 4—44 
defined, 5-38 

BC_FRM_CLK Cbox CSR, 4—46 
defined, 5-35 

BC_LAT_DATA_PATTERN Cbox CSR, 4-47 
defined, 5-35 

BC_LAT_TAG_PATTERN Cbox CSR, 4-47 
defined, 5-35 

BC_LATE_WRITE_NUM Cbox CSR, 4-48 
defined, 5-35 

BC_LATE_WRITE_UPPER Cbox CSR 
defined, 5-35 | 

BC_PENTIUM_MODE Cbox CSR, 4-50 
defined, 5—35 

BC_PERR error status in C_STAT, 5-42 

BC_RCV_MUX_CNT_PRESET Cbox CSR 
defined, 5-36 

BC_RCV_MUX_PRESET_CNT Cbox CSR, 4—47 

BC_RD_RD_BUBBLE Cbox CSR 
defined, 5—34 

BC_RD_WR_BUBBLES Cbox CSR, 4-48 
defined, 5-34 

BC_RDVICTIM Cbox CSR, 4-22, 4-24 
defined, 5-34 

BC_SIZE Cbox CSR, 4-49, 5-39, 7-12 
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BC_SJ_LBANK_ENABLE Cbox CSR 
defined, 5-34 

BC_TAG_DDM_FALL_EN Cbox CSR, 4-46 
defined, S5—35 

BC_TAG_DDM_RISE_EN Cbox CSR, 4-46 
defined, 5-36 

BC_WR_RD_BUBBLES Cbox CSR, 448 
defined, 5-34 

BC_WR_WR_BUBBLE Cbox CSR, 4-52 
defined, 5-34 

BC_WRT_STS Cbox CSR, 5-39, 7~12 


Bcache 
banking, 4—52 
bubbles on the data bus, 4—48 
clocking, 4-43 
control pins, 4-50 
data read transactions, 4-46 
data single-bit correctable ECC error, 8-5 
data single-bit correctable ECC error on a probe, 
8-9 
data write transactions, 4-47 
error case summary for, 8-10 
filling Dcache error, 8-6 
filling Icache error, 8-5 
forwarding clock pin groupings, E-1 
maximum clock ratio, 4—41 
port, 4-41 
port pins, 4—42 
programming the size of, 4—49 
setting clock period, 4—44 
structure of, 4-6 
tag parity errors, 8-5 
tag read transactions, 4-46 
victim read during an ECB instruction error, 


victim read during Dcache/Bcache miss error, 


victim read error, 8-7 
BcAdd_H signal pins, 3-3, 442 
characteristics, 4-49 
BcCheck_H signal pins, 3-3, 4—42 
BcData_H signal pins, 3—3, 4-42 
BcDataInClk_H signal pins, 3~3, 4-42 
using, 4-51 
BcDataOE_L signal pin, 3-3, 4-42 
BcDataOutClk_x signal pins, 3-4, 4-42 
BcDataWr_L signal pin, 3-4, 4—42 
BcLoad_L signal pin, 3-4, 4-42 
BcTag_H signal pins, 3-4, 4-42 
BcTagDirty_H signal pin, 3-4, 4-42 
BcTagInClk_H signal pin, 3-4, 4-42 
using, 4-51 
BcTagOE_L signal pin, 3-4, 4-42 
BcTagOutCik_x signal pins, 3—4, 442 


BcTagParity_H signal pin, 3-4, 4-42 
BcTagShared_H signal pin, 3-4, 4-43 
BcTagValid_H signal pin, 3-4, 4-43 
BcTagWr_L signal pin, 3-4, 4-43 
BcVref signal pin, 3-4, 4-43 


Bidirectional differential amplifier receiver - 
open-drain. See B_ DA_OD 


Bidirectional differential amplifier receiver - 
push-pull. See BLDA_PP 


Binary multiple abbreviations, xix 

BiST. See Built-in self-test 

Bit notation conventions, xx 

Bounder-scan register, B-1 

Branch history table, initialized by BiST, 7-12 

Branch mispredication, pipeline abort delay from, 
2-16 

Branch predictor, 2-3 

BSDL description of the boundary-scan register, 
B-1 

Built-in self-test, 11-5 

load, 7-6 


Cc 


C_ADDR Cbox read register field, 5—42 
C_DATA Cbox data register, 5-33 

at power-on reset state, 7—16 
C_SHFT Cbox shift register, 5-33 

at power-on reset state, 7—16 
C_STAT Cbox read register field, 5-42 
C_STS Cbox read register field, 5-42 
C_SYNDROME_0 Cbox read register field, S—41 
C_SYNDROME_1 Cbox read register field, 5-41 
Cache block states, 4—9 


response to 21264A commands, 4-10 
transitions, 4—9 


Cache coherency, 4-8 
CALL_PAL entry points, 6-12 
Caution convention, xx 
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Cbox 


data register C_DATA, 5-33 
described, 2-11, 4-3 
duplicate Dcache tag array, 2-11 
duplicate Dcache tag array with, 4-13 
HW_MTPR and HW_MFPR to CSR, D-15 
V/O write buffer, 2-11 
internal processor registers, 5-3 
probe queue, 2-11 
read register, 5-41 
shift register C_LSHFT, 5-33 
victim address file, 2—11 
WRITE_MANY chain, 5-38 
WRITE_MANY chain example, 5-39 
WRITE_ONCE chain, 5-33 

CC cycle counter register, 5-3 
at power-on reset state, 7-15 

CC_CTL cycle counter control register, 5-3 
at power-on reset state, 7-15 


CFR_EV6CLK_DELAY Cbox CSR, defined, 5-37 
CFR_FRMCLK_DELAY Cbox CSR, defined, 5-38 
CFR_GCLK_DELAY Cbox CSR, defined, 5-37 
ChangeToDirtyFail, SysDc command, 4-10, 4-11, 
4-12 
ChangeToDirtySuccess, SysDc command, 4-10, 
4-11, 4-12 
Choice predictor, 2-5 
ChxToDirty, 21264A command, 4-11 
CLAMP public instruction, B—1 
Clean cache block state, 4—9 
Clean/Shared cache block state, 4—9 
CleanToDirty, 21264A command, 4-21, 4-38 
system probes, with, 4-40 
CleanVictimBlk, 21264A command, 4-21, 4-38 
CikFwdRst_H signal pin, 3-4, 4-29 
with system initialization, 7-7 
CikIn_x signal pins, 3-4 
Clock forwarding, 7-4 
CLR_MAP clear virtual-to-physical map register, 
5-21 
at power-on reset state, 7-15 
CMOV instruction, special cases of, 2-26 
COLD reset machine state, 7-17 


Commands 


21264A to system, 4-18 
system to 21264A, 4-25 
when to NXM, 4-37 


Conventions, xix 
abbreviations, xix 
address, xx 
aligned, xx 
bit notation, xx 
caution, xx 
data units, xxi 
do not care, xxi 
external, xxi 
field notation, xxi 
note, XxXi 
numbering, xxi 
ranges and extents, xxi 
register figures, xxi 
signal names, xxi 
unaligned, xx 
X, XXi 

CTAG, 4-13 


D 


Data cache. See Dcache 


Data merging 
load instructions in I/O address space, 2-27 
store instructions in I/O address space, 2-29 


Data transfer commands, system, 4-27 


Data types . 
floating point support, 1-2 
integer supported, 1-2 
supported, 1-1 

Data units convention, XxX1 


Data wrap, 4-35 


double-pumped, 4-36 
interleaved, 4—36 


DATA_VALID_DLY Cbox CSR, defined, 5-38 


dc 
characteristics of, 9-2 
input pin capacitance defined, 9-2 
test load defined, 9-2 
voltage on signal pins, 9-1 
DC_CTL Dcache control register, 5-30 
at power-on reset state, 7—16 
error correction and, 8-2 


DC_PERR error status in C_STAT, 5-42 


DC_STAT Dcache status register, 5-31 
at power-on reset state, 7-16 
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Deache 
described, 2-12 
duplicate tag parity errors, 8-4 
duplicate tags with, 4-13 
error case summary for, 8-10 
fill from Bcache error, 8-6 
fill from memory errors, 8-8 
initialized by BiST, 7-12 
pipelined, 2-16 
single-bit correctable ECC error, 8-3 
store second error, 8-4 
tag parity errors, 8-2 
victim extracts, 8-4 
Deache data single-bit correctable ECC errors, 8~3 
Deache tag, initialized by BiST, 7-12 
DCOK_H signal pin, 3-4 
power-on reset flow, 7-1 
DCVIC_THRESHOLD Cbox CSR, defined, 5-34 
DFAULT fault, 6-13 
Differential 21264A clocks, 7-19 
Differential reference clocks, 7-19 
Dirty cache block state, 4-9 
Dirty/Shared cache block state, 4-9 
Do not care convention, XxXi 
Double-bit fill errors, 8-9 
DOWN 1 reset machine state, 7-18 
DOWN2 reset machine state, 7-19 
DOWNS reset machine state, 7-19 
Dstream translation buffer, 2—13 
See also DTB 
DSTREAM_BC_DBL error status in C_STAT, 
5-42 
DSTREAM_BC_ERR error status in C_STAT, 
5-42 
DSTREAM_DC_ERR error status in C_LSTAT, 
5-42 
DSTREAM_MEM_DBL error status in C_STAT, 
5-42 
DSTREAM_MEM_ERKR error status in C_STAT, 
5-42 
DTAG. See Duplicate Dcache tag array 
DTB entries, writing multiple in same PAL flow, 
D-18 
DTB fill, 6-14 
DTB, pipeline abort delay with, 2-16 
DTB_ALTMODE alternate processor mode register, 
5-26 
at power-on reset state, 7—15 


DTB_ASNO address space number register 0 
at power-on reset state, 7-15 

DTB_ASNO address space number registers 0, 5-28 

DTB_ASN|I1 address space number register 1, 5-28 
at power-on reset state, 7-15 

DTB_IA invalidate-all process register, 5-27 
at power-on reset state, 7—15 

DTB_IAP invalidate-all (ASM=0) process register, 

5-27 

at power-on reset state, 7-15 

DTB_ISO invalidate single (array 0) register, 5-27 
at power-on reset state, 7-15 

DTB_IS1 invalidate single (array 1) register, 5—27 
at power-on reset state, 7—15 

DTB_PTEO array write 0 register 


at power-on reset state, 7-15 
MTPR to, D-—12 


DTB_PTEO array write register0, 5-26 
DTB_PTE] array write 1 register, 5—26 


at power-on reset state, 7~—15 
MTPR to, D-12 


DTB_TAGO array write 0 register, 5—25 


at power-on reset state, 7-15 
MTPR to, D-12 


DTB_TAGI array write 1 register, 5-25 


at power-on reset state, 7-15 
MTPR to, D-12 


DTBM_DOUBLE_3 fault, 6-13 
DTBM_DOUBLE_4 fault, 6-13 
DTBM_SINGLE fault, 6-13 

Dual-data rate SSRAM pin assignments, E-3 
DUP_TAG_ENABLE Cbox CSR, defined, 5-34 
Duplicate Dcache tag array, 2-11 

Duplicate Dcache, initialized by BiST, 7—12 
Duplicate tag array, Cbox copy. See CTAG 
Duplicate tag stores, Bcache, 4-7 


E 


Ebox 
cycle counter control register CC_CTL, 5-3 
cycle counter register CC, 5-3 
described, 2-8 
executed in pipeline, 2-16 
internal processor registers, 5-1 
slotting, 2-18 
subclusters, 2-18 
virtual address control register VA_CTL, 5-4 
virtual address format register VA_FORM, 5-5 
virtual address register, 5—4 


ECB instruction, external interface reference, 4—5 
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ECC 


64-bit data and check bit code, 8-2 

Deache data single-bit correctable errors, 8-3 

for system data bus, 8-2 

memory/system port single-bit correctable 
errors, 8-7 

store instructions, 8-4 


ENABLE_EVICT Cbox CSR, 4-22, 5-39 

ENABLE_PROBE_CHECK Cbox CSR, 8-2 
defined, 5-35 

ENABLE_STC_COMMAND Cbox CSR, defined, 

5-35 

Energy star certification, 7-9 

Error case summary, 8-10 

Error correction code. See ECC 

Error detection mechanisms, 8-1 

EV6CIk_x signal pins, 3-4 

Evict, 21264A command, 4—12, 4-21, 4-38 

EVICT_ENABLE Cbox CSR, 7-12 


EXC_ADDR excepticn address register, 5-8 


after fault reset, 7-8 
at power-on reset state, 7-14 


EXC_SUM exception summary register, 5-13 
at power-on reset state, 7-15 
Exception and interrupt logic, 2-8 
Exception condition summary, A—15 
External cache and system interface unit. See Cbox 
External convention, xxi 
External interface initialization, 7-14 
EXTEST public instruction, B-1 


F 


F3] 


load instructions with, 2—23 
retire instructions with, 2~22 


Fast data disable mode, 4-32 

Fast data mode, 4-28, 4-30 

FAST_MODE_DISABLE Cbox CSR, 4-28 
defined, 5~34 

Fault reset flow, 7-8 

Fault reset sequence of operations, 7-9 

FAULT_RESET reset machine state, 7-18 


Fbox 


described, 2-10 
executed in pipeline, 2-16 


FEN fault, 6—13 


FetchBlk, 21264A command, 4-21, 4-38 
system probes, with, 4—40 


FetchBlkSpec, 21264A command, 4-21, 4-38 
Field notation convention, xxi 


Floating-point arithmetic trap, pipeline abort delay 
with, 2-16 


Floating-point control register, 2—35 
PALcode emulation of, 6-11 
Floating-point execution unit. See Fbox 


Floating-point instructions 
TEEE, A-9 
independent, A-11 
VAX, A-11 


Floating-point issue queue, 2-7 
Forwarding clock pin groupings, E-1 
FPCR. See Floating-point control register 
FQ. See Floating-point issue queue 
FrameClk_x signal pins, 3-5, 4-29 


G 


GCLK, 7-19 
Global predictor, 2-4 


H 


Heat sink center temperature, 10-1 
Heat sink specifications, 10-3 
HW_INT_CLR hardware interrupt clear register, 
5-12 

at power-on reset state, 7—15 

updating, D-17 
HW_LD PALcode instruction, 6-3, A-9, D-17 
HW_MFPR PALcode instruction, 6-6, A-9 
HW_MTPR PALcode instruction, 6-6, A—-9 
HW_REI PALcode instruction, A~-9 
HW_RET PALcode instruction, 6—5 
HW_ST PALcode instruction, 6-4, A—9 


I/O address space 
instruction data merging, 2-29 
load instruction data merging, 2-27 
load instructions with, 2-27 
store instructions with, 2-28 
V/O write buffer, 2-11 
defined, 2-32 
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I_CTL Ibox control register, 5-15 
after fault reset, 7-8 
after warm reset, 7-11 
at power-on reset state, 7-15 
PALshadow registers, 6—11 
through sleep mode, 7-10 
VA_48 field update, D-16 


I_DA pin type, 3-3, 9-2 
values for, 9-3 
I_DA_CLK pin type, 3-3, 9-2 
values for, 9-3 
I_DC_POWER pin type, 9-2 
I]_DC_REF pin type, 3-3, 9-2 
values for, 9-3 
I_STAT Ibox status register, 5-18 
at power-on reset state, 7-15 
IACV fault, 6-13 


Ibox 


branch predictor, 2-3 

clear virtual-to-physical map register 
CLR_MAP, 5-21 

exception address register EXC_ADDR, 5-8 

exception and interrupt logic, 2-8 

exception summary register EXC_SUM, 5-13 

floating-point issue queue, 2-7 

hardware interrupt clear register HW_INT_CLR, 

5-12 

Ibox control register CTL, 5-15 

Ibox process context register PCTX, 5-21 

Ibox status register STAT, 5-18 

Icache flush ASM register IC_FLUSH_ASM, 

5-21 

Icache flush register IC_FLUSH, 5-21 

instruction fetch logic, 2-6 

instruction virtual address format register 
IVA_FORM, 5-9 

instruction-stream translation buffer, 2-5 

integer issue queue, 2-6 

internal processor registers, 5—1 

interrupt enable and current processor mode 
register IER_CM, 5-9 

interrupt summary register ISUM, 5-11 

ITB invalidate single register ITB_IS, 5-7 

ITB invalidate-all ASM (ASM=0) register 
ITB_IAP, 5-7 

ITB invalidate-all register ITB_IA, 5-7 

ITB PTE array write register ITB_PTE, 5-6 

ITB tag array write register ITB_TAG, 5-6 

PAL base register PAL_BASE, 5-15 

performance counter control register 
PCTR_CTL, 5-23 

ProfileMe register PMPC, 5-8 

register rename maps, 2-6 

retire logic, 2-8 

retire logic and mapper, required sequence for, 

sleep mode register SLEEP, 5-21 

software interrupt request register SIRR, 5-10 

subsections in, 2-2 

virtual program counter logic, 2-2 


IC_FLUSH Icache flush register 

at power-on reset state, 7-15 
IC_FLUSH_ASM Icache flush ASM register, 5-21 
Icache 

data errors, 8-2 

error case summary for, 8-10 

fill from Bcache error, 8-5 

fill from memory error, 8-7 

flush register IC_FLUSH, 5-21 

initialized by BiST, 7-12 

tag, initialized by BiST, 7-12 
IEEE 1149.1 


notes for compliance to, 11-7 
test port reset, 7-16 
test port, operation of, 11-3 


IEEE floating-point conformance, A-—14 
IEEE floating-point instruction opcodes, A—9 


IER_CM interrupt enable and current processor mode 

register, 5-9 
at power-on reset state, 7~15 

IMPLVER instruction values, 2—38 

Independent floating-point function codes, A-11 

INIT_MODE Cbox CSR, 5-39, 7-12 

Initialization mode processing, 7-12 

Input dc reference pin. See I DC_REF pin type 

Input differential amplifier clock receiver. See 
I_DA_CLK pin type 

Input differential amplifier receiver. See I_DA pin 
type 

Instruction fetch logic, 2-6 

Instruction fetch, issue, and retire unit. See Ibox 

Instruction fetch, pipelined, 2-14 

Instruction issue rules, 2—16 

Instruction latencies, pipelined, 2-20 

Instruction ordering, 2-30 

Instruction retire latencies, minimum, 2~21 


Instruction retire rules 
F31, 2-22 
floating-point divide, 2-22 
floating-point square root, 2—22 
pipelined, 2-21 
R31, 2-22 
Instruction slot, pipelined, 2-14 
Instruction-stream translation buffer, 2—5 


Int_Add_BcCik internal forwarded clock, 4-43, 
4-47 


Int_Data_BcClk internal forwarded clock, 4—43, 
4-48 


INT_FWD_CLK clock queue, 4-29 
Integer arithmetic trap, pipeline abort delay with, 
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2-16 
Integer execution unit. See Ebox 
Integer issue queue, 2-6 
pipelined, 2-15 
Internal processor registers, 5—1 


accessing, 6-7 
explicitly written, 6-8 
implicitly written, 6-9 
ordering access, 6-9 
paired fetch order, 6~-9 
scoreboard bits for, 6-8 


INTERRUPT interrupt, 6-14 

INVAL_TO_DIRTY Cbox CSR, 4-22 
programming, 4—22 

INVAL_TO_DIRTY_ENABLE Cbox CSR, 5-39, 

7-12 

InvalToDirty, 21264A command, 4-12, 4-21, 4-38 
system probes, with, 4—40 

InvalToDirtyVic, 21264A command, 4-21, 4-38 

IOWB. See I/O write buffer 

IPRs. See Internal processor registers 

IQ. See Integer issue queue 

IRQ_H signal pins, 3-5 

Istream, 2-5 

Istream memory references 


translation to external references, 4—5 
ISTREAM_BC_DBIL error status in C_LSTAT, 5-42 


ISTREAM_BC_ERR error status in C_STAT, 5-42 


ISTREAM_MEM_DBL error status in C_STAT, 
5-42 
ISTREAM_MEM_ERKR error status in C_STAT, 
5—42 
ISUM interrupt summary register, 5-11 
at power-on reset state, 7-15 
ITB, 2-5 
ITB fill, 6-16 
ITB miss, pipeline abort delay with, 2-16 
ITB_IA invalidate-all register, 5-7 
at power-on reset state, 7-14 
ITB_IAP invalidate-all (ASM=0) register, 5-7 
at power-on reset state, 7-14 
ITB_IS invalidate single register, 5-7 
at power-on reset state, 7-14 
ITB_MISS fault, 6-14 
ITB_PTE array write register, 5-6 
at power-on reset state, 7-14 
ITB_TAG array write register, 5-6 
at power-on reset state, 7-14 
IVA_FORM instruction virtual address format 


register, 5-9 
at power-on reset state, 7-14 


J 


JITTER_CMD Cbox CSR, defined, 5—38 
JMP misprediction, in PALcode, D-15 


JSR misprediction 

in PALcode, D-15 

pipeline abort delay with, 2-16 
JSR_COR misprediction, in PALcode, D-15 


Junction temperature, 9-1 


L 


Late-write non-bursting SSRAM pin assignments, 
E-2 

LDBU instruction, normal prefetch with, 2-23 

LDF instruction, normal prefetch with, 2—23 

LDG instruction, normal prefetch with, 2-23 

LDQ instruction, prefetch with evict next, 2-23 

LDS instruction, prefetch with modify intent, 2-23 

LDT instruction, normal prefetch with, 2-23 

LDWU instruction, normal prefetch with, 2—23 


LDx_L instructions 
in-order processing for, 4—14 
locking mechanism for, 4—14 
Load hit speculation, 2-24 
Load instructions 
ECC with, 8-3 
I/O reference ordering, 2-30 
Mbox order traps, 2-31 


memory reference ordering, 2~30 
translation to external interface, 4—5 


Load queue, described, 2-13 
Load-load order trap, 2-31 
Local predictor, 2—4 

Lock mechanism, 4-13 

Logic symbol, the 21264A, 3-2 
LQ. See Load queue 


M 


M_CTL Mbox control register, 5-29 
at power-on reset state, 7-15 
MAF. See Miss address file 


MB instruction processing, 2-32 

MB, 21264A command, 4-12, 4-21 
MB_CNT Cbox CSR, operation, 2~32 
MBDone, SysDc command, 4-12 
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Mbox 

Dcache control register DC_CTL, 5-30 

Deache status register DC_STAT, 5-31 

described, 2-12 

Dstream translation buffer, 2-13 

DTB address space number registers 0 and 1 
DTB_ASNx, 5-28 

DTB alternate processor mode register 
DTB_ALTMODE, 5-26 

DTB invalidate-all (ASM=0) process register 
DTB_IAP, 5-27 

DTB invalidate-all process register DTB_IA, 

5-27 


DTB invalidate-single registers 0 and 1 
DTB_ISx, 5-27 
DTB PTE array write registers 0 and 1 
DTB_PTEx, 5-26 
DTB tag array write registers 0 and | 
DTB_TAGx, 5-25 
internal processor registers, 5-2 
load queue, 2-13 
Mbox control register M_CTL, 5-29 
memory management status register 
MM_STAT, 5-28 
miss address file, 2-13 
order traps, 2~31 
pipeline abort delay with order trap, 2-16 
pipeline abort delays, 2-16 
store queue, 2—13 
MBOX_BC_PRB_STALL Cbox CSR, defined, 
5-35 
MCHK interrupt, 6-14 
Mechanical specifications, 3-16 


Memory 
error case summary for, 8-11 
filling Dcache errors, 8-8 
filling Icache errors, 8-7 
Memory address space 
load instructions with, 2-27 
merging rules, 2-29 
store instructions with, 2~28 
Memory barrier instructions 
translation to external interface, 4—5 
Memory barriers, 2-32 
Memory reference unit. See Mbox 
MF_FPCR instruction, 6-12 


Microarchitecture 
summarized, 2-1 
MiscVref signal pin, 3-5 
Miss address file, 2—13 
I/O address space loads, 2-27 
memory address space loads, 2-27 
memory address space stores, 2—28 
MM_STAT memory management status register, 
5-28 . 
at power-on reset state, 7-15 
MT_FPCR instruction, 6-12 


MT_FPCR synchronous trap, 6—14 


N 


NoConnect pin type, 3-3 
Nonexistent memory 

processing, 4-37 
NOP, 21264A command, 4~—20 
Note convention, xxi 
Numbering convention, xxi 
NXM. See Nonexistent memory 
NZNOP, 21264A command, 4-20 


O 


O_OD pin type, 3-3, 9-2 
values for, 9-4 

O_OD_TP pin type, 3-3, 9-2 
values for, 9—4 

O_PP pintype, 3-3, 9-2 
values for, 9-5 

O_PP_CLK pin type, 3~3, 9-2 
values for, 9-5 

OPCDEC fault, 6-13 

Opcodes 
IEEE floating-point, A-—9 
independent floating-point, A—11 
reserved for Compaq, A-8 
reserved for PALcode, A-—9 
summary of, A—12 
VAX floating-point, A—11 


Open-drain driver for test pins. See O_OD_TP 
Open-drain output driver. See O_OD pin type 
Operating temperature, 10-1 


p 


Packaging, 3-18 
Paired instruction fetch order, 6—9 
PAL_BASE register, 5~15 

after fault reset, 7-8 

after warm reset, 7-11 


at power-on reset state, 7—15 
through sleep mode, 7-10 
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PALcode 
conditional branches in, D—14 
described, 6-1 
entries points for, 6-12 
exception entry points, 6-13 
guidelines for, D-1 
HW_LLD instruction, 6—3 
HW_MEFPR instruction, 6—6 
HW_MTPR instruction, 6—6 
HW_RET instruction, 6-5 
HW_ST instruction, 6-4 
required function codes, 6-3 
reserved opcodes for, 6-3 
restrictions for, D-1 


PALmode environment, 6—2 
PALshadow registers, 6-11 


PCTR_CTL performance counter control counter 
register 
updating, D-16 
PCTR_CTL performance counter control register, 
5-23 
at power-on reset state, 7-15 
updating, D-17 
PCTX Ibox process context register, 5~—21 
after fault reset, 7-8 
after warm reset, 7-11 
at power-on reset state, 7-15 
through sleep mode, 7-10 
Phase-lock loop. See PLL 
Physical address considerations, 4-4 
Pipeline 
abort delay, 2-16 
Deache access, 2—16 
Ebox execution, 2-16 
Ebox slotting, 2-18 
Fbox execution, 2-16 
instruction fetch, 2-14 
instruction group definitions, 2—17 
instruction issue rules, 2-16 
instruction latencies, 2-20 
instruction retire rules, 2-21 
instruction slot, 2—14 
issue queue, 2-15 
organization, 2~-13 
register maps, 2-15 
register reads, 2-16 
PLL 
description, 7-19 
output clocks, 7-19 
ramp up, 7-6 
PLL_IDD, values for, 9-3 
PLL_VDD signal pin, 3~—5 
PLL_VDD, values for, 9~3 
PilBypass_H signal pin, 3-5 


PMPC ProfileMe register, 5-8 


Ports 


TEEE 1149.1, 11-3 
serial terminal, 11-2 
SROM load, 11-2 


Power 


maximum, 9~1 
sleep defined, 9-3 


Power supply sequencing, 9-5 


Power-on 
flow signals and constraints, 7—7 
reset flow, 7-1 
self-test and initialization, 11—5 
timing sequence, 7~3 

PRB_TAG_ONLY Cbox CSR, 4-27 
defined, S-—34 

Privileged architecture library code 
See PALcode 

Probe commands, system, 4-25, 4-38 


Probe queue, 2-11 
PROBE_BC_ERKR error status in C_STAT, 5-42 


ProbeResponse, 21264A command, 4-20, 4—23, 
4-37 


ProfileMe mode, 6—20 
Push-pull output clock driver. See O_PP_CLK 
Push-pull output driver. See O_PP 


R 


R31 


load instructions with, 2~23 
retire instructions with, 2—22 
speculative loads to, 2-24 


RAMPI reset machine state, 7-17 
RAMP? reset machine state, 7-18 
Ranges and extents convention, xxi 
RdBlk, 21264A command, 4-37 
RdBIkI, 21264A command, 4-38 
RdBlkMod, 21264A command, 4-38 
RdBlikModSpec, 21264A command, 4-38 
RdBlkModVic, 21264A command, 4-38 
RdBlkSpec, 21264A command, 4-37 
RdBlkSpeclI, 21264A command, 4—38 
RdBlkVic, 21264A command, 4—37 
RdBlkVicI, 21264A command, 4-38 
RdBytes, 21264A command, 4-38 
RdLWs, 21264A command, 4-38 
RdQWs, 21264A command, 4-38 


RDVIC_ACK_INHIBIT Cbox CSR, 4-24, 4-25 
defined, 5-34 
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ReadBik, 21264A command, 4-21 
system probes, with, 4—40 
ReadBIkI, 21264A command, 4-21 


ReadBlkMod, 21264A command, 4-21 

system probes, with, 4-40 
ReadBIkModSpec, 21264A command, 4-21 
ReadBIkModVic, 21264A command, 4-21 
ReadBlkSpec, 21264A command, 4-21 
ReadBikSpecI, 21264A command, 4-21 
ReadBlkVic, 21264A command, 4-21 
ReadBlkVicI, 21264A command, 4-21 
ReadBytes, 21264A command, 4-21 
ReadData, SysDc command, 4~10, 4-11, 4-12 
ReadDataDirty, SysDc command, 4-10, 4-11, 4-12 
ReadDataError, SysDc command, 4-10, 4-11, 4-12 


ReadDataShared, SysDc command, 4-10, 4-11, 
4-12 


ReadDataShared/Dirty, SysDc command, 4-10, 
4-11, 4-12 

ReadLWs, 21264A command, 4-21 
ReadQWs, 21264A command, 4-21 
Register access abbreviations, xix 
Register figure conventions, Xxi 
Register maps, pipelined, 2-15 
Register rename maps, 2-6 
Replay traps, 2-31 
RESET interrupt, 6-14 
Reset state machine 

major operations of, 7-1 
Reset_L signal pin, 3-5 

power-on reset flow, 7-1 
RET misprediction, in PALcode, D-15 
Retire logic, 2-8, D-1 
RO,n convention, xix 
RUN reset machine state, 7—18 


RW,n convention, xx 


S 


SAMPLE public instruction, B-1 
Scrubbing single-bit errors, D-18 
I_CTL Ibox control register 
updating I_CTL, D-17 
Second-level cache. See Bcache 


Security holes 
with UNPREDICTABLE results, xxii 


Serial terminal port, 11-2 
SET_DIRTY_ENABLE Cbox CSR, 4-22, 5-39, 
7-12 
programming, 4-22 
SharedToDirty, 21264A command, 4—21, 4-38 
system probes, with, 4—40 
Signal name convention, xxi 
Signal pin types, defined, 3-3 
Signal pins 
test, 11-1 
Single-bit error scribbing, D-18 
Single-bit errors in hardware, correcting, 8-2 
SIRR software interrupt request register, 5—10 
at power-on reset state, 7-15 
SKEWED_FILL_MODE Cbox CSR 
defined, 5-34 
Sleep mode 
flow, 7-9 
timing sequence, 7-11 
SLEEP mode register, 5-21 
at power-on reset state, 7-15 
Spare pin type, 3-3 
SPEC_READ_ENABLE Cbox CSR, 4-22 
defined, 5-35 
SQ. See Store queue 


SROM content map, 11-6 

SROM initialization, 11-5 

SROM interface, in microarchitecture, 2-13 
SROM line, Icache bit fields ina, 11-6 
SROM load, 7-6 

SROM load operation, 11-2 

SromClk_H signal pin, 3-5, 11-2 
SromData_H signal pin, 3-5, 11-2 
SromOE_L signal pin, 3-5, 11-2 


SSRAMs 
dual-data rate pin assignments, E-3 
late-write non-bursting pin assignments, E-2 
STC_ENABLE Cbox CSR, 4-23 


STCChangeToDirty, 21264A command, 4-12, 
4-21, 4-38 


Storage temperature, 9-1 


Store instructions 


Dcache ECC errors with, 8-4 

V/O address space, 2-28 

I/O reference ordering, 2-30 

Mbox order traps, 2-31 

memory address space, 2-28 
memory reference ordering, 2-30 
translation to external interface, 4—5 
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Store queue, 2-13 
Store-load order trap, 2-31 
STx_C instructions 
in-order processing for, 4-14 
locking mechanism for, 4—14 
Supply voltage signal pins. See I DC_POWER pin 
type 
Synchronous static random-access memory. See 
SSRAMs 
SYS_BPHASE_LD_VECTOR Cbox CSR, 4-17 
defined, 5-38 
SYS_BUS_FORMAT Cbox CSR, defined, 5-34 
SYS_BUS_SIZE Cbox CSR, 4-20 
defined, 5-34 
SYS_CLK_DELAY Cbox CSR, defined, 5-36 
SYS_CLK_LD_VECTOR Cbox CSR, 4-17 
defined, 5-38 
SYS_CLK_RATIO Cbox CSR, defined, 5-34 
SYS_CLKFWD_ENABLE Cbox CSR, defined, 
5-36 
SYS_CPU_CLK_DELAY Cbox CSR 
defined, 5-38 
SYS_DDM_FALL_EN Cbox CSR, 4-18 
defined, 5-36 
SYS_DDM_RD_FALL_EN Cbox CSR, 4-18 
SYS_DDM_RD_RISE_EN Cbox CSR, 4-18 
SYS_DDM_RISE_EN Cbox CSR, 4-18 
defined, 5-36 
SYS_DDMF_ENABLE Cbox CSR, 4-18 
defined, 5-36 
SYS_DDMR_ENABLE Cbox CSR, 4-18 
defined, 5-36 
SYS_FDBK_EN Cbox CSR, 4-17 
defined, 5—38 
SYS_FRAME_LD_VECTOR Cbox CSR, 4-18, 
4-29 
defined, 5-38 
SYS_RCV_MUX_CNT_PRESET Cbox CSR, 4-29 
defined, 5-36 
SYS_RCV_MUX_PRESET Cbox CSR, 4-32 


SysAddin_L signal pins, 3-5 
SysAddInClk_L signal pin, 3-5 
SysAddOut_L signal pins, 3-5 
SysAddOutClk_L signal pin, 3-5 


SYSBUS_ACK_LIMIT Cbhox CSR, 4—24 
defined, 5-34 
SYSBUS_FORMAT Cbox CSR, 4-20 


SYSBUS_MB_ENABLE Cbox CSR, 4-22 


defined, 5-34 
operation, 2—32 


SYSBUS_VIC_LIMIT Cbox CSR, 4-25 
defined, 5-34 
SysCheck_L signal pin, 3-5 
SYSCLK, 4-29 
SysData_L signal pin, 3-5 
SysDataInClk_H signal pin, 3-5 
SysDataInValid_L signal pin, 3-5 
rules for, 4—33 
SysDataOutClk_L signal pin, 3-5 
SysDataOutValid_L signal pin, 3-5 
rules for, 4-34 
SysDc commands, 4-10 
system probes, with, 440 
SysDc field, system to 21264A commands, 4-27 
SYSDC_DELAY Cbox CSR, 4-31 
defined, 5-38 
SysFillValid_L signal pin, 3-5 
rules for, 4—34 
System clock ratio configuration, 7—4 
System initialization, 7-7 
System interface clocks, programming, 4—17 
System port, 4-16 
SysVref signal pin, 3~5 


T 


Tag parity errors, 8—2 
TB fill flow, 2-34, 6-14 
Tck_H signal pin, 3-6 
Tdi_H signal pin, 3-6 
Tdo_H signal pin, 3-6 


Temperatures 


maximium ave age per frequency, 10-2 
operating, 10-1 


Terminology, xix 
TestStat_H signal pin, 3-6 


purpose for, 11-4 
with BiST and SROM load, 7-6 


Thermal design characteristics, 10-6 
Tms_H signal pin, 3-6 


Traps 
load-load order, 2~—31 
Mbox order, 2-31 
replay, 2-31 
store-load order, 2-31 
Trst_L signal pin, 3-6 
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U 


UNALIGN fault, 6-13 
Unaligned convention, xx 


V 


VA virtual address register, 5—4 
at power-on reset state, 7-15 
VA_CTL virtual address control register, 5—4 


at power-on reset state, 7-15 
updating VA_48 field, D-17 


VA_FORM virtual address format register, 5-5 
at power-on reset state, 7-15 
VAF. See Victim address file 


VAX floating-point instruction opcodes, A-11 
VBIAS defined, 9-2 

VDB. See Victim data buffer 
VDBFlushRequest, 21264A command, 4-21 
VDD signal pin list, 3-16 

VDD, values for, 9-3 

VDF. See Victim data file 

Vdiff defined, 9-2 


Victim address file 
described, 2—11 
Victim address file, described, 2-11 


Victim data buffer (VDB), 4-7 

Virtual address support, 1-2 

Virtual program counter logic, 2-2 
VPC. See Virtual program counter logic 
VREF, values for, 9-3 

VSS signal pin list, 3-16 


W 


WAIT_BiSI reset machine state, 7-18 
WAIT_BiST reset machine state, 7-18 
WAIT_CikFwdRst0 reset machine state, 7-18 
WAIT_ClkFwdRstl reset machine state, 7-18 
WAIT_INTERRUPT reset machine state, 7-19 
WAIT_NOMINAL reset machine state, 7-17 
WAIT_RESET reset machine state, 7-18 
WAIT_SETTLE reset machine state, 7-17 
WAKEUP interrupt, 6-14 

WAR, eliminating, 2-6 

Warm reset flow, 7-11 





WAW 

eliminating, 2-6 
WMB instruction processing, 2—33 
WO,n convention, xx 


Wrap order 


double-pumped, 4-36 
interleaved, 4-36 


WrBytes, 21264A command, 4-21, 4-38 


Write hint instructions, translation to external 
interface, 4—5 


WRITE_MANY chain, 5-38 
example, 5-39 
values for Bcache initialization, 7-12 


WRITE_MANY register 


after fault reset, 7-8 
after warm reset, 7-11 
through sleep mode, 7-10 


WRITE_ONCE chain description, 5-33 
Write-after-read. See WAR 
Write-after-write. See WAW 

WrLWs, 21264A command, 4-21, 4-38 
WrQwWs, 21264A command, 4-21, 4—38 


WrVictimBlk, 21264A command, 4—21, 4-38 
system probes, with, 4-40 


X 


X convention, Xxi 
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